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Assistant Commissioner for Patents 
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Sir: 

PATENT APPLICATION TRANSMITTAL LETTER 

Transmitted herewith for filing, please find 

A Utility Patent Application under 37 C.F.R. 1 .53(b). 
It is a continuing application, as follows: 

□ continuation d divisional ^ continuation-in-part of prior application number 
PCT/IB98/01665 filed October 9. 1998. 



Q A Provisional Patent Application under 37 C.F.R. 1 .53(c). 
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EH A Design Patent Application (submitted in duplicate). 
Including the following: 

IZ3 Provisional Application Cover Sheet. 

^ New or Revised Specification, including pages _1 to 518 containing: 



1^1 Specification 
^ Claims 
Abstract 

CH Substitute Specification, including Claims and Abstract. 

O The present application is a continuation application of Application 

No. filed ._ The present application includes the 

Specification of the parent application which has been revised in 
accordance with the amendments filed in the parent application. Since 
none of those amendments incorporate new matter into the parent 
application, the present revised Specification also does not include new 
matter. 

The present application is a continuation application of Application 

No. filed ^ which in turn is a continuation-in-part of 

Application No. filed ._ The present application 

includes the Specification of the parent application which has been 
revised in accordance with the amendments filed in the parent 
application. Although the amendments in the parent C-I-P application 
may have incorporated new matter, since those are the only revisions 
included in the present application, the present application includes no 
new matter in relation to the parent application. 

EH A copy of earlier application Serial No. Filed 

including Specification, Claims and Abstract (pages 1 - @@), to which no new 
matter has been added TOGETHER WITH a copy of the executed oath or declaration 
for such earlier application and all drawings and appendices. Such earlier application 
is hereby incorporated into the present application by reference. 



CD Please enter the following amendment to the Specification under the Cross-Reference 
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to Related Applications section (or create such a section) : "This Application: 

□ is a continuation of is a divisional of claims benefit of U.S. provisional 

Application Serial No. filed 



EH Signed Statement attached deleting inventor(s) named in the prior application. 
EH A Preliminary Amendment. 

Twentv-seven (27) Sheets of □ Formal ^ Informal Drawings. 
CD Petition to Accept Photographic Drawings. 

□ Petition Fee 

An CH Executed ^ Unexecuted Declaration or Oath and Power of Attorney. 
CH An Associate Power of Attorney. 

E3 AnD Executed O Copy of Executed Assignment of the Invention to 

d A Recordation Form Cover Sheet. 
□ Recordation Fee - $40.00. 
The prior application is assigned of record to 

Priority is claimed under 3 5 U.S.C. § 119 of Patent Application No. _ 
PCT/IB98/01665 filed October 9. 1998. 

□ A Certified Copy of each of the above applications for which priority is 
claimed: 
n is enclosed. 

CI has been filed in prior application Serial No. filed . 

D AnD Executed or tZl Copy of Executed Earlier Statement Claiming Small Entity 



□ 
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Status under 37 C.F.R. 1.9 and 1.27 
is enclosed. 

has been filed in prior application Serial No. filed x 

said status is still proper and desired in present case. 

□ Diskette Containing DNA/ Amino Acid Sequence Information. 

□ Statement to Support Submission of DNA/ Amino Acid Sequence Information. 

The computer readable form in this application ^ is identical with that filed 

in Application Serial Number , filed ^ In accordance with 37 

CFR 1.821(e), please use the L~H first-filed, d last-filed or EH only computer 
readable form filed in that application as the computer readable form for the instant 
application. It is understood that the Patent and Trademark Office will make the 
necessary change in application number and filing date for the computer readable 
form that will be used for the instant application. A paper copy of the Sequence 

Listing is included in the originally-filed specification of the instant application, 

included in a separately filed preliminary amendment for incorporation into the 
specification. 

D Information Disclosure Statement. 

□ Attached Form 1449. 

Copies of each of the references listed on the attached Form PTO-1449 are 
enclosed herewith. 

□ A copy of Petition for Extension of Time as filed in the prior case. 

□ Appended Material as follows: ._ 

Return Receipt Postcard (should be specifically itemized). 



□ 



Other as follows: 
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FEE CALCULATION: 



Cancel in this application original claims of the prior application before 

calculating the filing fee. (At least one original independent claim must be retained 
for filing purposes.) 



1 




SMALL ENTITY 


NOT SMALL ENTITY 


mm 




RATE 


FEE 


RATE 


FEE 


PROVISIONAL APPLICATION 


$75.00 


$ 


$150.00 


$ 


DESIGN APPLICATION 


$155.00 


$ 


$310.00 


$ 


UTILITY APPLICATIONS BASE FEE 


$380.00 


$ 


$760.00 


$ 760 


UTILITY APPLICATION; ALL CLAIMS 
CALCULATED AFTER ENTRY OF ALL 
AMENDMENTS 




liilllllllil 






No. Filed 


No. Extra 












TOTAL 
CLAIMS 


24 - 20 = 


4 


$9 each 


$ 


$18 each 


$ 72 


| 


INDEP. 
CLAIMS 


5-3 = 


2 


$39 each 


$ 


$78 each 


$ 156 


1 


FIRST PRESENTATION OF MULTIPLE 
DEPENDENT CLAIM 


$130 


$ 


$260 


$ 260 


ADDITIONAL FILING FEE 


mm 


$ 


mm 


$ 


TOTAL FILING FEE DUE 


mm 


$ 


mmi 


$ 1248 



A Check is enclosed in the amount of $ 1.248 . 

£3 The Commissioner is authorized to charge payment of the following fees and to 

refund any overpayment associated with this communication or during the pendency 
of this application to deposit account 23-3050. This sheet is provided in duplicate. 



[ZI The foregoing amount due. 

Any additional filing fees required, including fees for the presentation of extra 
claims under 37 C.F.R. 1.16. 

^ Any additional patent application processing fees under 37 C.F.R. 1 . 17 or 
1.20(d). 

□ The issue fee set in 37 C.F.R. 1.18 at the mailing of the Notice of Allowance. 

The Commissioner is hereby requested to grant an extension of time for the 
appropriate length of time, should one be necessary, in connection with this filing or 
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any future filing submitted to the U.S. Patent and Trademark Office in the above- 
identified application during the pendency of this application. The Commissioner is 
further authorized to charge any fees related to any such extension of time to deposit 
account 23-3050. This sheet is provided in duplicate. 

SHOULD ANY DEFICIENCIES APPEAR with respect to this application, including 
deficiencies in payment of fees, missing parts of the application or otherwise, the United 
States Patent and Trademark Office is respectfully requested to promptly notify the 
undersigned. 



Woodcock Washburn Kurtz 
Mackiewicz & Norris LLP 
One Liberty Place - 46th Floor 
Philadelphia PA 19103 
Telephone: (215)568-3100 
Facsimile: (215) 568-3439 





Mark J. Rosen 
Registration No. 39,822 
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NEISSERIAL ANTIGENS 

This application is a continuation-in-part of international patent application PCMB98/01 665, filed 
October 9, 1998, from which priority is claimed under 35 U.S.C. § 1 19. 

This invention relates to antigens from Neisseria bacteria. 

BACKGROUND ART 

Neisseria meningitidis and Neisseria gonorrhoeae are non-motile, gram negative diplococci that 
are pathogenic in humans. N. meningitidis colonises the pharynx and causes meningitis (and, 
occasionally, septicaemia in the absence of meningitis); N. gonorrhoeae colonises the genital tract 
and causes gonorrhea. Although colonising different areas of the body and causing completely 
different diseases, the two pathogens are closely related, although one feature that clearly 
differentiates meningococcus from gonococcus is the presence of a polysaccharide capsule that is 
present in all pathogenic meningococci. 

N.gonorrhoeae caused approximately 800,000 cases per year during the period 1983-1990 in the 
United States alone (chapter by Meitzner & Cohen, "Vaccines Against Gonococcal Infection", In: 
New Generation Vaccines, 2nd edition, ed. Levine, Woodrow, Kaper, & Cobon, Marcel Dekker, 
New York, 1997, pp.8 17-842). The disease causes significant morbidity but limited mortality. 
Vaccination against N.gonorrhoeae would be highly desirable, but repeated attempts have failed. 
The main candidate antigens for this vaccine are surface-exposed proteins such as pili, porins, 
opacity-associated proteins (Opas) and other surface-exposed proteins such as the Lip, Laz, IgAl 
protease and transferrin-binding proteins. The lipooligosaccharide (LOS) has also been suggested 
as vaccine (Meitzner & Cohen, supra). 

N meningitidis causes both endemic and epidemic disease. In the United States the attack rate is 
0.6-1 per 100,000 persons per year, and it can be much greater during outbreaks (see Lieberman 
et al. (1996) Safety and Immunogenicity of a Serogroups A/C Neisseria meningitidis 
Oligosaccharide-Protein Conjugate Vaccine in Young Children. JAMA 275(19):1499-1503; 
Schuchat et al (1997) Bacterial Meningitis in the United States in 1995. N Engl J Med 337(14):970- 
976). In developing countries, endemic disease rates are much higher and during epidemics 
incidence rates can reach 500 cases per 100,000 persons per year. Mortality is extremely high, at 
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10-20% in the United States, and much higher in developing countries. Following the introduction 
of the conjugate vaccine against Haemophilus influenzae, N. meningitidis is the major cause of 
bacterial meningitis at all ages in the United States (Schuchat et al (1997) supra). 

Based on the organism's capsular polysaccharide, 12 serogroups of N. meningitidis have been 
identified. Group A is the pathogen most often implicated in epidemic disease in sub-Saharan 
Africa. Serogroups B and C are responsible for the vast majority of cases in the United States and 
in most developed countries. Serogroups W135 and Y are responsible for the rest of the cases in 
the United States and developed countries. The meningococcal vaccine currently in use is a 
tetravalent polysaccharide vaccine composed of serogroups A, C, Y and W135. Although 
efficacious in adolescents and adults, it induces a poor immune response and short duration of 
protection, and cannot be used in infants [eg. Morbidity and Mortality weekly report, Vol.46, No. 
RR-5 (1997)]. This is because polysaccharides are T-cell independent antigens that induce a weak 
immune response that cannot be boosted by repeated immunization. Following the success of the 
vaccination against H. influenzae, conjugate vaccines against serogroups A and C have been 
developed and are at the final stage of clinical testing (Zollinger WD "New and Improved Vaccines 
Against Meningococcal Disease" in: New Generation Vaccines, supra, pp. 469-488; Lieberman et 
al (1996) supra; Costantino et al (1992) Development and phase I clinical testing of a conjugate 
vaccine against meningococcus A and C. Vaccine 10:691-698). 

Meningococcus B remains a problem, however. This serotype currently is responsible for 
approximately 50% of total meningitis in the United States, Europe, and South America. The 
polysaccharide approach cannot be used because the menB capsular polysaccharide is a polymer 
of a(2-8)-linked 7V-acetyl neuraminic acid that is also present in mammalian tissue. This results in 
tolerance to the antigen; indeed, if an immune response were elicited, it would be anti-self, and 
therefore undesirable. In order to avoid induction of autoimmunity and to induce a protective 
immune response, the capsular polysaccharide has, for instance, been chemically modified 
substituting the 7V-acetyl groups with N-propionyl groups, leaving the specific antigenicity 
unaltered (Romero & Outschoorn (1994) Current status of Meningococcal group B vaccine 
candidates: capsular or non-capsular? Clin Microbiol Rev 7(4):559-575). 

Alternative approaches to menB vaccines have used complex mixtures of outer membrane proteins 
(OMPs), containing either the OMPs alone, or OMPs enriched in porins, or deleted of the class 4 
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OMPs that are believed to induce antibodies that block bactericidal activity. This approach 
produces vaccines that are not well characterized. They are able to protect against the homologous 
strain, but are not effective at large where there are many antigenic variants of the outer membrane 
proteins. To overcome the antigenic variability, multivalent vaccines containing up to nine different 
5 porins have been constructed (eg. Poolman JT (1992) Development of a meningococcal vaccine. 
Infect. Agents Dis. 4:13-28). Additional proteins to be used in outer membrane vaccines have been 
the opa and opc proteins, but none of these approaches have been able to overcome the antigenic 
variability (eg. Ala'Aldeen & Borriello (1996) The meningococcal transferrin-binding proteins 1 
and 2 are both surface exposed and generate bactericidal antibodies capable of killing homologous 
10 and heterologous strains. Vaccine 14(l):49-53). 

A certain amount of sequence data is available for meningococcal and gonoccocal genes and 
proteins (eg. EP-A-0467714, W096/29412), but this is by no means complete. The provision of 
further sequences could provide an opportunity to identify secreted or surface-exposed proteins that 
are presumed targets for the immune system and which are not antigenically variable. For instance, 
1 5 some of the identified proteins could be components of efficacious vaccines against meningococcus 
B, some could be components of vaccines against all meningococcal serotypes, and others could 
be components of vaccines against all pathogenic Neisseriae. 

THE INVENTION 

The invention provides proteins comprising the Neisserial amino acid sequences disclosed in the 
20 examples. These sequences relate to N. meningitidis or N. gonorrhoeae. 

It also provides proteins comprising sequences homologous (ie. having sequence identity) to the 
Neisserial amino acid sequences disclosed in the examples. Depending on the particular sequence, 
the degree of identity is preferably greater than 50% (eg. 65%, 80%, 90%, or more). These 
homologous proteins include mutants and allelic variants of the sequences disclosed in the 
25 examples. Typically, 50% identity or more between two proteins is considered to be an indication 
of functional equivalence. Identity between the proteins is preferably determined by the 
Smith- Waterman homology search algorithm as implemented in the MPSRCH program (Oxford 
Molecular), using an affine gap search with parameters gap open penalty =12 and gap extension 
penalty=l. 
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The invention further provides proteins comprising fragments of the Neisseria! amino acid 
sequences disclosed in the examples. The fragments should comprise at least n consecutive amino 
acids from the sequences and, depending on the particular sequence, n is 7 or more {eg. 8, 10, 12, 
14, 16, 18, 20 or more). Preferably the fragments comprise an epitope from the sequence. 

5 The proteins of the invention can, of course, be prepared by various means {eg. recombinant 
expression, purification from ceil culture, chemical synthesis etc.) and in various forms {eg. native, 
fusions etc.). They are preferably prepared in substantially pure or isolated form {ie. substantially 
free from other Neisserial or host cell proteins) 

According to a further aspect, the invention provides antibodies which bind to these proteins. These 
1 0 may be polyclonal or monoclonal and may be produced by any suitable means. 

According to a further aspect, the invention provides nucleic acid comprising the Neisserial 
nucleotide sequences disclosed in the examples. In addition, the invention provides nucleic acid 
comprising sequences homologous {ie. having sequence identity) to the Neisserial nucleotide 
sequences disclosed in the examples. 

1 5 Furthermore, the invention provides nucleic acid which can hybridise to the Neisserial nucleic acid 
disclosed in the examples, preferably under "high stringency" conditions {eg. 65°C in a O.lxSSC, 
0.5% SDS solution). 

Nucleic acid comprising fragments of these sequences are also provided. These should comprise 
at least n consecutive nucleotides from the Neisserial sequences and, depending on the particular 
20 sequence, n is 10 or more {eg 12, 14, 15, 18, 20, 25, 30, 35, 40 or more). 

According to a further aspect, the invention provides nucleic acid encoding the proteins and protein 
fragments of the invention. 

It should also be appreciated that the invention provides nucleic acid comprising sequences 
complementary to those described above {eg. for antisense or probing purposes). 
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Nucleic acid according to the invention can, of course, be prepared in many ways (eg. by chemical 
synthesis, from genomic or cDNA libraries, from the organism itself etc.) and can take various 
forms (eg. single stranded, double stranded, vectors, probes etc.). 

In addition, the term "nucleic acid" includes DNA and RNA, and also their analogues, such as 
5 those containing modified backbones, and also peptide nucleic acids (PNA) etc. 

According to a further aspect, the invention provides vectors comprising nucleotide sequences of 
the invention (eg. expression vectors) and host cells transformed with such vectors. 

According to a further aspect, the invention provides compositions comprising protein, antibody, 
and/or nucleic acid according to the invention. These compositions may be suitable as vaccines, 
^ 10 for instance, or as diagnostic reagents, or as immunogenic compositions. 

The invention also provides nucleic acid, protein, or antibody according to the invention for use 
as medicaments (eg. as vaccines) or as diagnostic reagents. It also provides the use of nucleic acid, 
protein, or antibody according to the invention in the manufacture of: (i) a medicament for treating 
or preventing infection due to Neisserial bacteria; (ii) a diagnostic reagent for detecting the 
15 presence of Neisserial bacteria or of antibodies raised against Neisserial bacteria; and/or (iii) a 
reagent which can raise antibodies against Neisserial bacteria. Said Neisserial bacteria may be any 
species or strain (such as N. gonorrhoeae, or any strain oiN. meningitidis, such as strain A, strain 
B or strain C). 

The invention also provides a method of treating a patient, comprising administering to the patient 
20 a therapeutically effective amount of nucleic acid, protein, and/or antibody according to the 
invention. 

According to further aspects, the invention provides various processes. 

A process for producing proteins of the invention is provided, comprising the step of culturing a 
host cell according to the invention under conditions which induce protein expression. 

25 A process for producing protein or nucleic acid of the invention is provided, wherein the the protein 
or nucleic acid is synthesised in part or in whole using chemical means. 
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A process for detecting polynucleotides of the invention is provided, comprising the steps of: (a) 
contacting a nucleic probe according to the invention with a biological sample under hybridizing 
conditions to form duplexes; and (b) detecting said duplexes. 

A process for detecting proteins of the invention is provided, comprising the steps of: (a) contacting 
5 an antibody according to the invention with a biological sample under conditions suitable for the 
formation of an antibody-antigen complexes; and (b) detecting said complexes. 

A summary of standard techniques and procedures which may be employed in order to perform the 
invention {eg. to utilise the disclosed sequences for vaccination or diagnostic purposes) follows. 
This summary is not a limitation on the invention but, rather, gives examples that may be used, but 
10 are not required. 

General 

The practice of the present invention will employ, unless otherwise indicated, conventional 
techniques of molecular biology, microbiology, recombinant DNA, and immunology, which are 
within the skill of the art. Such techniques are explained fully in the literature eg. Sambrook 

1 5 Molecular Cloning; A Laboratory Manual, Second Edition (1989); DNA Cloning, Volumes I and 
ii (D.N Glover ed. 1985); Oligonucleotide Synthesis (M.J. Gait ed, 1984); Nucleic Acid 
Hybridization (B.D. Hames & S.J. Higgins eds. 1984); Transcription and Translation (B.D. Hames 
& S.J. Higgins eds. 1984); Animal Cell Culture (R.I. Freshney ed. 1986); Immobilized Cells and 
Enzymes (IRL Press, 1986); B. Perbal, A Practical Guide to Molecular Cloning (1984); the 

20 Methods in Enzymology series (Academic Press, Inc.), especially volumes 154 & 155; Gene 
Transfer Vectors for Mammalian Cells (J.H. Miller and M.P. Calos eds. 1987, Cold Spring Harbor 
Laboratory); Mayer and Walker, eds. (1987), Immunochemical Methods in Cell and Molecular 
Biology (Academic Press, London); Scopes, (1987) Protein Purification: Principles and Practice, 
Second Edition (Springer-Verlag, N.Y.), and Handbook of Experimental Immunology, Volumes 

25 I-IV (D.M. Weir and C. C. Blackwell eds 1986). 

Standard abbreviations for nucleotides and amino acids are used in this specification. 

All publications, patents, and patent applications cited herein are incorporated in full by reference. 
In particular, the contents of UK patent applications 9723516.2, 9724190.5, 9724386.9, 9725158.1, 
9726147.3, 9800759.4, and 9819016.8 are incorporated herein. 
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Definitions 

A composition containing X is "substantially free of Y when at least 85% by weight of the total 
X+Y in the composition is X. Preferably, X comprises at least about 90% by weight of the total of 
X+Y in the composition, more preferably at least about 95% or even 99% by weight. 
5 The term "comprising" means "including" as well as "consisting" eg. a composition "comprising" 
X may consist exclusively of X or may include something additional to X, such as X+Y. 

A "conserved" Neisseria amino acid fragment or protein is one that is present in a particular 
Neisserial protein in at least x% of Neisseria. The value of x may be 50% or more, e.g., 66%, 75%, 
80%, 90%, 95% or even 100% (i.e. the amino acid is found in the protein in question in all 

1 0 Neisseria). In order to determine whether an animo acid is "conserved" in a particular Neisserial 
protein, it is necessary to compare that amino acid residue in the sequences of the protein in 
question from a plurality of different Neisseria (a reference population). The reference population 
may include a number of different Neisseria species or may include a single species. The reference 
population may include a number of different serogroups of a particular species or a single 

15 serogroup. A preferred reference population consists of the 5 most common NeisseriaThe term 
"heterologous" refers to two biological components that are not found together in nature. The 
components may be host cells, genes, or regulatory regions, such as promoters. Although the 
heterologous components are not found together in nature, they can function together, as when a 
promoter heterologous to a gene is operably linked to the gene. Another example is where a 

20 Neisserial sequence is heterologous to a mouse host cell. A further examples would be two epitopes 
from the same or different proteins which have been assembled in a single protein in an 
arrangement not found in nature. 

An "origin of replication" is a polynucleotide sequence that initiates and regulates replication of 
polynucleotides, such as an expression vector. The origin of replication behaves as an autonomous 

25 unit of polynucleotide replication within a cell, capable of replication under its own control. An 
origin of replication may be needed for a vector to replicate in a particular host cell. With certain 
origins of replication, an expression vector can be reproduced at a high copy number in the 
presence of the appropriate proteins within the cell. Examples of origins are the autonomously 
replicating sequences, which are effective in yeast; and the viral T-antigen, effective in COS-7 

30 cells. 
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A "mutant" sequence is defined as DNA, RNA or amino acid sequence differing from but having 
sequence identity with the native or disclosed sequence. Depending on the particular sequence, the 
degree of sequence identity between the native or disclosed sequence and the mutant sequence is 
preferably greater than 50% (eg. 60%, 70%, 80%, 90%, 95%, 99% or more, calculated using the 
5 Smith- Waterman algorithm as described above). As used herein, an "allelic variant" of a nucleic 
acid molecule, or region, for which nucleic acid sequence is provided herein is a nucleic acid 
molecule, or region, that occurs essentially at the same locus in the genome of another or second 
isolate, and that, due to natural variation caused by, for example, mutation or recombination, has 
a similar but not identical nucleic acid sequence. A coding region allelic variant typically encodes 
10 a protein having similar activity to that of the protein encoded by the gene to which it is being 
compared. An allelic variant can also comprise an alteration in the 5' or 3' untranslated regions of 
the gene, such as in regulatory control regions (eg. see US patent 5,753,235). 

Expression systems 

The Neisserial nucleotide sequences can be expressed in a variety of different expression systems; 
15 for example those used with mammalian cells, baculoviruses, plants, bacteria, and yeast. 
i. Mammalian Systems 

Mammalian expression systems are known in the art. A mammalian promoter is any DNA 
sequence capable of binding mammalian RNA polymerase and initiating the downstream (3') 
transcription of a coding sequence (eg. structural gene) into mRNA. A promoter will have a 

20 transcription initiating region, which is usually placed proximal to the 5' end of the coding 
sequence, and a TATA box, usually located 25-30 base pairs (bp) upstream of the transcription 
initiation site. The TATA box is thought to direct RNA polymerase II to begin RNA synthesis at 
the correct site. A mammalian promoter will also contain an upstream promoter element, usually 
located within 100 to 200 bp upstream of the TATA box. An upstream promoter element 

25 determines the rate at which transcription is initiated and can act in either orientation [Sambrook 
et al. (1989) "Expression of Cloned Genes in Mammalian Cells." In Molecular Cloning: A 
Laboratory Manual, 2nd ed.J. 

Mammalian viral genes are often highly expressed and have a broad host range; therefore sequences 
encoding mammalian viral genes provide particularly useful promoter sequences. Examples include 
30 the SV40 early promoter, mouse mammary tumor virus LTR promoter, adenovirus major late 



CHIR-0160 (356.001) 



-9- 



PATENT 



promoter (Ad MLP), and herpes simplex virus promoter. In addition, sequences derived from non- 
viral genes, such as the murine metallotheionein gene, also provide useful promoter sequences. 
Expression may be either constitutive or regulated (inducible), depending on the promoter can be 
induced with glucocorticoid in hormone-responsive cells. 

5 The presence of an enhancer element (enhancer), combined with the promoter elements described 
above, will usually increase expression levels. An enhancer is a regulatory DNA sequence that can 
stimulate transcription up to 1000-fold when linked to homologous or heterologous promoters, with 
synthesis beginning at the normal RNA start site. Enhancers are also active when they are placed 
upstream or downstream from the transcription initiation site, in either normal or flipped orien- 

10 tation, or at a distance of more than 1000 nucleotides from the promoter [Maniatis et al. (1987) 
Science 236:1237; Alberts et al. (1989) Molecular Biology of the Cell, 2nd ed.]. Enhancer elements 
derived from viruses may be particularly useful, because they usually have a broader host range. 
Examples include the SV40 early gene enhancer [Dijkema et al (1985) EMBO J. 4:761] and the 
enhancer/promoters derived from the long terminal repeat (LTR) of the Rous Sarcoma Virus 

15 [Gorman et al. (1982b) Proc. Natl. Acad. Sci. 79:6111] and from human cytomegalovirus [Boshart 
et al. (1985) Cell 41:521]. Additionally, some enhancers are regulatable and become active only 
in the presence of an inducer, such as a hormone or metal ion [Sassone-Corsi and Borelli (1986) 
Trends Genet. 2:215; Maniatis et al. (1987) Science 236:1237]. 

A DNA molecule may be expressed intracellularly in mammalian cells. A promoter sequence may be 
20 directly linked with the DNA molecule, in which case the first amino acid at the N-terminus of the 
recombinant protein will always be a methionine, which is encoded by the ATG start codon. If desired, 
the N-terminus may be cleaved from the protein by in vitro incubation with cyanogen bromide. 

Alternatively, foreign proteins can also be secreted from the cell into the growth media by creating 
chimeric DNA molecules that encode a fusion protein comprised of a leader sequence fragment that 

25 provides for secretion of the foreign protein in mammalian cells. Preferably, there are processing 
sites encoded between the leader fragment and the foreign gene that can be cleaved either in vivo 
or in vitro. The leader sequence fragment usually encodes a signal peptide comprised of 
hydrophobic amino acids which direct the secretion of the protein from the cell. The adenovirus 
triparite leader is an example of a leader sequence that provides for secretion of a foreign protein 

30 in mammalian cells. 
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Usually, transcription termination and polyadenylation sequences recognized by mammalian cells 
are regulatory regions located 3' to the translation stop codon and thus, together with the promoter 
elements, flank the coding sequence. The 3' terminus of the mature niRNA is formed by site- 
specific post-transcriptional cleavage and polyadenylation [Birnstiel et al. (1985) Cell 41:349; 
5 Proudfoot and Whitelaw (1988) "Termination and 3' end processing of eukaryotic RNA. In 
Transcription and splicing (ed. B.D. Hames andD.M. Glover); Proudfoot (1989) Trends Biochem. 
Set 74:105]. These sequences direct the transcription of an mRNA which can be translated into the 
polypeptide encoded by the DNA. Examples of transcription terminater/polyadenylation signals 
include those derived from SV40 [Sambrook et al (1989) "Expression of cloned genes in cultured 
1 0 mammalian cells. " In Molecular Cloning: A Laboratory Manual] . 

Usually, the above described components, comprising a promoter, polyadenylation signal, and 
transcription termination sequence are put together into expression constructs. Enhancers, introns 
with functional splice donor and acceptor sites, and leader sequences may also be included in an 
expression construct, if desired. Expression constructs are often maintained in a replicon, such as 

15 an extrachromosomal element (eg. plasmids) capable of stable maintenance in a host, such as 
mammalian cells or bacteria. Mammalian replication systems include those derived from animal 
viruses, which require trans-acting factors to replicate. For example, plasmids containing the 
replication systems of papovaviruses, such as SV40 [Gluzman (1981) Cell 23:115] or 
polyomavirus, replicate to extremely high copy number in the presence of the appropriate viral T 

20 antigen. Additional examples of mammalian replicons include those derived from bovine 
papillomavirus and Epstein-Barr virus. Additionally, the replicon may have two replicaton systems, 
thus allowing it to be maintained, for example, in mammalian cells for expression and in a 
prokaryotic host for cloning and amplification. Examples of such mammalian-bacteria shuttle 
vectors include pMT2 [Kaufman et al. (1989) Mol. Cell. Biol. 9:946] and pHEBO [Shimizu et al. 

25 (1986) Mol. Cell. Biol. 6~:1074]. 

The transformation procedure used depends upon the host to be transformed. Methods for 
introduction of heterologous polynucleotides into mammalian cells are known in the art and include 
dextran-mediated transfection, calcium phosphate precipitation, polybrene mediated transfection, 
protoplast fusion, electroporation, encapsulation of the polynucleotide(s) in liposomes, and direct 
30 microinjection of the DNA into nuclei. 



CHIR-0160 (356.001) rAlfcJN l 

-11- 

Mammalian cell lines available as hosts for expression are known in the art and include many 
immortalized cell lines available from the American Type Culture Collection (ATCC), including 
but not limited to, Chinese hamster ovary (CHO) cells, HeLa cells, baby hamster kidney (BHK) 
cells, monkey kidney cells (COS), human hepatocellular carcinoma cells (eg. Hep G2), and a 
5 number of other cell lines. 
ii. Baculo virus Systems 

The polynucleotide encoding the protein can also be inserted into a suitable insect expression 
vector, and is operably linked to the control elements within that vector. Vector construction 
employs techniques which are known in the art. Generally, the components of the expression 

1 0 system include a transfer vector, usually a bacterial plasmid, which contains both a fragment of the 
baculovirus genome, and a convenient restriction site for insertion of the heterologous gene or 
genes to be expressed; a wild type baculovirus with a sequence homologous to the baculovirus- 
specific fragment in the transfer vector (this allows for the homologous recombination of the 
heterologous gene in to the baculovirus genome); and appropriate insect host cells and growth 

15 media. 

After inserting the DNA sequence encoding the protein into the transfer vector, the vector and the 
wild type viral genome are transfected into an insect host cell where the vector and viral genome 
are allowed to recombine. The packaged recombinant virus is expressed and recombinant plaques 
are identified and purified. Materials and methods for baculovirus/insect cell expression systems 
20 are commercially available in kit form from, inter alia, Invitrogen, San Diego CA ("MaxBac" kit). 
These techniques are generally known to those skilled in the art and fully described in Summers 
and Smith, Texas Agricultural Experiment Station Bulletin No. 1555 (1987) (hereinafter "Summers 
and Smith"). 

Prior to inserting the DNA sequence encoding the protein into the baculovirus genome, the above 
25 described components, comprising a promoter, leader (if desired), coding sequence of interest, and 
transcription termination sequence, are usually assembled into an intermediate transplacement 
construct (transfer vector). This construct may contain a single gene and operably linked regulatory 
elements; multiple genes, each with its owned set of operably linked regulatory elements; or 
multiple genes, regulated by the same set of regulatory elements. Intermediate transplacement 
30 constructs are often maintained in a replicon, such as an extrachromosomal element (eg. plasmids) 
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capable of stable maintenance in a host, such as a bacterium. The replicon will have a replication 
system, thus allowing it to be maintained in a suitable host for cloning and amplification. 

Currently, the most commonly used transfer vector for introducing foreign genes into AcNPV is 
pAc373. Many other vectors, known to those of skill in the art, have also been designed. These 
5 include, for example, pVL985 (which alters the polyhedrin start codon from ATG to ATT, and 
which introduces a BamHI cloning site 32 basepairs downstream from the ATT; see Luckow and 
Summers, Virology (1989) 17:31. 

The plasmid usually also contains the polyhedrin polyadenylation signal (Miller et al. (1988) Ann. 
Rev. Microbiol., 42:111) and a prokaryotic ampicillin-resistance (amp) gene and origin of 
10 replication for selection and propagation in E. coli. 

Baculovirus transfer vectors usually contain abaculovirus promoter. A baculovirus promoter is any 
DNA sequence capable of binding a baculovirus RNA polymerase and initiating the downstream 
(5' to 3') transcription of a coding sequence (eg. structural gene) into mRNA. A promoter will have 
a transcription initiation region which is usually placed proximal to the 5' end of the coding 
1 5 sequence. This transcription initiation region usually includes an RNA polymerase binding site and 
a transcription initiation site. A baculovirus transfer vector may also have a second domain called 
an enhancer, which, if present, is usually distal to the structural gene. Expression may be either 
regulated or constitutive. 

Structural genes, abundantly transcribed at late times in a viral infection cycle, provide particularly 
20 useful promoter sequences. Examples include sequences derived from the gene encoding the viral 
polyhedron protein, Friesen et al., (1986) "The Regulation of Baculovirus Gene Expression," in: 
The Molecular Biology ofBaculoviruses (ed. Walter Doerfler); EPO Publ. Nos. 127 839 and 155 
476; and the gene encoding the pi 0 protein, Vlak et al., (1988), J. Gen. Virol. 69:165. 

DNA encoding suitable signal sequences can be derived from genes for secreted insect or 
25 baculovirus proteins, such as the baculovirus polyhedrin gene (Carbonell et al. (1988) Gene, 
73:409). Alternatively, since the signals for mammalian cell posttranslational modifications (such 
as signal peptide cleavage, proteolytic cleavage, and phosphorylation) appear to be recognized by 
insect cells, and the signals required for secretion and nuclear accumulation also appear to be 
conserved between the invertebrate cells and vertebrate cells, leaders of non-insect origin, such as 
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those derived from genes encoding human a-interferon, Maeda et al., (1985), Nature 315:592; 
human gastrin-releasing peptide, Lebacq-Verheyden et al., (1988), Molec. Cell. Biol. S:3129; 
human IL-2, Smith et al, (1985) Proc. Nat'lAcad. Sci. USA, 52:8404; mouse IL-3, (Miyajima et 
al., (1987) Gene 55:273; and human glucocerebrosidase, Martin et al. (1988) DNA, 7:99, can also 
5 be used to provide for secretion in insects. 

A recombinant polypeptide or polyprotein may be expressed intracellularly or, if it is expressed 
with the proper regulatory sequences, it can be secreted. Good intracellular expression of nonfused 
foreign proteins usually requires heterologous genes that ideally have a short leader sequence 
containing suitable translation initiation signals preceding an ATG start signal. If desired, 
1 0 methionine at the N-terminus may be cleaved from the mature protein by in vitro incubation with 
cyanogen bromide. 

Alternatively, recombinant polyproteins or proteins which are not naturally secreted can be secreted 
from the insect cell by creating chimeric DNA molecules that encode a fusion protein comprised 
of a leader sequence fragment that provides for secretion of the foreign protein in insects. The 
15 leader sequence fragment usually encodes a signal peptide comprised of hydrophobic amino acids 
which direct the translocation of the protein into the endoplasmic reticulum. 

After insertion of the DNA sequence and/or the gene encoding the expression product precursor 
of the protein, an insect cell host is co-transformed with the heterologous DNA of the transfer 
vector and the genomic DNA of wild type baculovirus ~ usually by co-transfection. The promoter 

20 and transcription termination sequence of the construct will usually comprise a 2-5kb section of the 
baculovirus genome. Methods for introducing heterologous DNA into the desired site in the 
baculovirus virus are known in the art. (See Summers and Smith supra; Ju et al. (1987); Smith et 
al, Mol. Cell. Biol. (1983) 3:2156; and Luckow and Summers (1989)). For example, the insertion 
can be into a gene such as the polyhedrin gene, by homologous double crossover recombination; 

25 insertion can also be into a restriction enzyme site engineered into the desired baculovirus gene. 
Miller et al., (1989), Bioessays ¥:9LThe DNA sequence, when cloned in place of the polyhedrin 
gene in the expression vector, is flanked both 5' and 3' by polyhedrin-specific sequences and is 
positioned downstream of the polyhedrin promoter. 
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The newly formed baculovirus expression vector is subsequently packaged into an infectious 
recombinant baculovirus. Homologous recombination occurs at low frequency (between about 1% 
and about 5%); thus, the majority of the virus produced after cotransfection is still wild-type virus. 
Therefore, a method is necessary to identify recombinant viruses. An advantage of the expression 
5 system is a visual screen allowing recombinant viruses to be distinguished. The polyhedrin protein, 
which is produced by the native virus, is produced at very high levels in the nuclei of infected cells 
at late times after viral infection. Accumulated polyhedrin protein forms occlusion bodies that also 
contain embedded particles. These occlusion bodies, up to 15 urn in size, are highly retractile, 
giving them a bright shiny appearance that is readily visualized under the light microscope. Cells 

1 0 infected with recombinant viruses lack occlusion bodies. To distinguish recombinant virus from 
wild-type virus, the transfection supernatant is plaqued onto a monolayer of insect cells by 
techniques known to those skilled in the art. Namely, the plaques are screened under the light 
microscope for the presence (indicative of wild-type virus) or absence (indicative of recombinant 
virus) of occlusion bodies. "Current Protocols in Microbiology" Vol. 2 (Ausubel et al. eds) at 16.8 

15 (Supp. 10, 1990); Summers and Smith, supra; Miller et al. (1989). 

Recombinant baculovirus expression vectors have been developed for infection into several insect 
cells. For example, recombinant baculoviruses have been developed for, inter alia: Aedes aegypti 
, Autographa califomica, Bombyx mori, Drosophila melanogaster, Spodoptera frugiperda, and 
Trichoplusia ni (WO 89/046699; Carbonell et al., (1985) J. Virol. 56:153; Wright (1986) Nature 
20 527:718; Smith et al., (1983) Mol. Cell. Biol. 3:2156; and see generally, Fraser, et al. (1989) In 
Vitro Cell. Dev. Biol. 25:225). 

Cells and cell culture media are commercially available for both direct and fusion expression of 
heterologous polypeptides in a baculovirus/expression system; cell culture technology is generally 
known to those skilled in the art. See, eg. Summers and Smith supra. 

25 The modified insect cells may then be grown in an appropriate nutrient medium, which allows for 
stable maintenance of the plasmid(s) present in the modified insect host. Where the expression 
product gene is under inducible control, the host may be grown to high density, and expression 
induced. Alternatively, where expression is constitutive, the product will be continuously expressed 
into the medium and the nutrient medium must be continuously circulated, while removing the 

30 product of interest and augmenting depleted nutrients. The product may be purified by such 
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techniques as chromatography, eg. HPLC, affinity chromatography, ion exchange chromatography, 
etc.; electrophoresis; density gradient centrifugation; solvent extraction, or the like. As appropriate, 
the product may be further purified, as required, so as to remove substantially any insect proteins 
which are also secreted in the medium or result from lysis of insect cells, so as to provide a product 
5 which is at least substantially free of host debris, eg. proteins, lipids and polysaccharides. 

In order to obtain protein expression, recombinant host cells derived from the transformants are 
incubated under conditions which allow expression of the recombinant protein encoding sequence. 
These conditions will vary, dependent upon the host cell selected. However, the conditions are 
readily ascertainable to those of ordinary skill in the art, based upon what is known in the art. 
10 hi. Plant Systems 

There are many plant cell culture and whole plant genetic expression systems known in the art. 
Exemplary plant cellular genetic expression systems include those described in patents, such as: 
US 5,693,506; US 5,659,122; and US 5,608,143. Additional examples of genetic expression in 
plant cell culture has been described by Zenk, Phytochemistry 30:3861-3863 (1991). Descriptions 

15 of plant protein signal peptides may be found in addition to the references described above in 
Vaulcombe et al., Mol. Gen. Genet. 209:33-40 (1987); Chandler et al., Plant Molecular Biology 
3:407-418 (1984); Rogers, J. Biol. Chem. 260:3731-3738 (1985); Rothstein et al., Gene 55:353-356 
(1987); Whittier et al., Nucleic Acids Research 15:2515-2535 (1987); Wirsel et al., Molecular 
Microbiology 3:3-14 (1989); Yu et al., Gene 122:247-253 (1992). A description of the regulation 

20 of plant gene expression by the phytohormone, gibberellic acid and secreted enzymes induced by 
gibberellic acid can be found in R.L. Jones and J. MacMillin, Gibberellins: in: Advanced Plant 
Physiology,. Malcolm B. Wilkins, ed., 1984 Pitman Publishing Limited, London, pp. 21-52. 
References that describe other metabolically-regulated genes: Sheen, Plant Cell, 2:1027- 
1038(1990); Maas et al., EMBOJ. 9:3447-3452 (1990); Benkel and Hickey, Proc. Natl. Acad. Sci. 

25 84:1337-1339(1987) 

Typically, using techniques known in the art, a desired polynucleotide sequence is inserted into an 
expression cassette comprising genetic regulatory elements designed for operation in plants. The 
expression cassette is inserted into a desired expression vector with companion sequences upstream 
and downstream from the expression cassette suitable for expression in a plant host. The 
30 companion sequences will be of plasmid or viral origin and provide necessary characteristics to the 
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vector to permit the vectors to move DNA from an original cloning host, such as bacteria, to the 
desired plant host. The basic bacterial/plant vector construct will preferably provide a broad host 
range prokaryote replication origin; a prokaryote selectable marker; and, for Agrobacterium 
transformations, T DNA sequences for Agrobacterium-mediated transfer to plant chromosomes. 
5 Where the heterologous gene is not readily amenable to detection, the construct will preferably also 
have a selectable marker gene suitable for determining if a plant cell has been transformed. A 
general review of suitable markers, for example for the members of the grass family, is found in 
Wilmink and Dons, 1993, Plant Mol. Biol. Reptr, 11(2):165-185. 

Sequences suitable for permitting integration of the heterologous sequence into the plant genome 
10 are also recommended. These might include transposon sequences and the like for homologous 
recombination as well as Ti sequences which permit random insertion of a heterologous expression 
cassette into a plant genome. Suitable prokaryote selectable markers include resistance toward 
antibiotics such as ampicillin or tetracycline. Other DNA sequences encoding additional functions 
may also be present in the vector, as is known in the art. 

15 The nucleic acid molecules of the subject invention may be included into an expression cassette 
for expression of the protein(s) of interest. Usually, there will be only one expression cassette, 
although two or more are feasible. The recombinant expression cassette will contain in addition 
to the heterologous protein encoding sequence the following elements, a promoter region, plant 5' 
untranslated sequences, initiation codon depending upon whether or not the structural gene comes 

20 equipped with one, and a transcription and translation termination sequence. Unique restriction 
enzyme sites at the 5' and 3' ends of the cassette allow for easy insertion into a pre-existing vector. 

A heterologous coding sequence may be for any protein relating to the present invention. The 
sequence encoding the protein of interest will encode a signal peptide which allows processing and 
translocation of the protein, as appropriate, and will usually lack any sequence which might result 

25 in the binding of the desired protein of the invention to a membrane. Since, for the most part, the 
transcriptional initiation region will be for a gene which is expressed and translocated during 
germination, by employing the signal peptide which provides for translocation, one may also 
provide for translocation of the protein of interest. In this way, the protein(s) of interest will be 
translocated from the cells in which they are expressed and may be efficiently harvested. Typically 

30 secretion in seeds are across the aleurone or scutellar epithelium layer into the endosperm of the 
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seed. While it is not required that the protein be secreted from the cells in which the protein is 
produced, this facilitates the isolation and purification of the recombinant protein. 

Since the ultimate expression of the desired gene product will be in a eucaryotic cell it is desirable 
to determine whether any portion of the cloned gene contains sequences which will be processed 
5 out as introns by the host's splicosome machinery. If so, site-directed mutagenesis of the "intron" 
region may be conducted to prevent losing a portion of the genetic message as a false intron code, 
Reed and Maniatis, Cell 41:95-105, 1985. 

The vector can be microinjected directly into plant cells by use of micropipettes to mechanically 
transfer the recombinant DNA. Crossway, Mol. Gen. Genet, 202:179-185, 1985. The genetic 

10 material may also be transferred into the plant cell by using polyethylene glycol, Krens, et al., 
Nature, 296, 72-74, 1982. Another method of introduction of nucleic acid segments is high 
velocity ballistic penetration by small particles with the nucleic acid either within the matrix of 
small beads or particles, or on the surface, Klein, et al., Nature, 327, 70-73, 1987 and Knudsen and 
Muller, 1991, Planta, 185:330-336 teaching particle bombardment of barley endosperm to create 

15 transgenic barley. Yet another method of introduction would be fusion of protoplasts with other 
entities, either minicells, cells, lysosomes or other fusible lipid- surfaced bodies, Fraley, et al., Proc. 
Natl. Acad. Sci. USA, 79, 1859-1863, 1982. 

The vector may also be introduced into the plant cells by electroporation. (Fromm et al., Proc. Natl 
Acad. Sci. USA 82:5824, 1985). In this technique, plant protoplasts are electroporated in the 
20 presence of plasmids containing the gene construct. Electrical impulses of high field strength 
reversibly permeabilize biomembranes allowing the introduction of the plasmids. Electroporated 
plant protoplasts reform the cell wall, divide, and form plant callus. 

All plants from which protoplasts can be isolated and cultured to give whole regenerated plants can 
be transformed by the present invention so that whole plants are recovered which contain the 

25 transferred gene. It is known that practically all plants can be regenerated from cultured cells or 
tissues, including but not limited to all major species of sugarcane, sugar beet, cotton, fruit and 
other trees, legumes and vegetables. Some suitable plants include, for example, species from the 
genera Fragaria, Lotus, Medicago, Onobrychis, Trifolium, Trigonella, Vigna, Citrus, Linum, 
Geranium, Manihot, Daucus, Arabidopsis, Brassica, Raphanus, Sinapis, Atropa, Capsicum, 

30 Datura, Hyoscyamus, Lycopersion, Nicotiana, Solanum, Petunia, Digitalis, Majorana, Cichorium, 
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Helianthus, Lactuca, Bromus, Asparagus, Antirrhinum, Hererocallis, Nemesia, Pelargonium, 
Panicum, Pennisetum, Ranunculus, Senecio, Salpiglossis, Cucumis, Browaalia, Glycine, Lolium, 
Zea, Triticum, Sorghum, and Datura. 

Means for regeneration vary from species to species of plants, but generally a suspension of 
5 transformed protoplasts containing copies of the heterologous gene is first provided. Callus tissue 
is formed and shoots may be induced from callus and subsequently rooted. Alternatively, embryo 
formation can be induced from the protoplast suspension. These embryos germinate as natural 
embryos to form plants. The culture media will generally contain various amino acids and 
hormones, such as auxin and cytokinins. It is also advantageous to add glutamic acid and proline 
10 to the medium, especially for such species as corn and alfalfa. Shoots and roots normally develop 
simultaneously. Efficient regeneration will depend on the medium, on the genotype, and on the 
history of the culture. If these three variables are controlled, then regeneration is fully reproducible 
and repeatable. 

In some plant cell culture systems, the desired protein of the invention may be excreted or 
15 alternatively, the protein may be extracted from the whole plant. Where the desired protein of the 
invention is secreted into the medium, it may be collected. Alternatively, the embryos and 
embryoless-half seeds or other plant tissue may be mechanically disrupted to release any secreted 
protein between cells and tissues. The mixture may be suspended in a buffer solution to retrieve 
soluble proteins. Conventional protein isolation and purification methods will be then used to 
20 purify the recombinant protein. Parameters of time, temperature pH, oxygen, and volumes will be 
adjusted through routine methods to optimize expression and recovery of heterologous protein. 
iv. Bacterial Systems 

Bacterial expression techniques are known in the art. A bacterial promoter is any DNA sequence 
capable of binding bacterial RNA polymerase and initiating the downstream (3') transcription of 

25 a coding sequence (eg. structural gene) into mRNA. A promoter will have a transcription initiation 
region which is usually placed proximal to the 5' end of the coding sequence. This transcription 
initiation region usually includes an RNA polymerase binding site and a transcription initiation site. 
A bacterial promoter may also have a second domain called an operator, that may overlap an 
adjacent RNA polymerase binding site at which RNA synthesis begins. The operator permits 

30 negative regulated (inducible) transcription, as a gene repressor protein may bind the operator and 
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thereby inhibit transcription of a specific gene. Constitutive expression may occur in the absence 
of negative regulatory elements, such as the operator. In addition, positive regulation may be 
achieved by a gene activator protein binding sequence, which, if present is usually proximal (5') 
to the RNA polymerase binding sequence. An example of a gene activator protein is the catabolite 
5 activator protein (CAP), which helps initiate transcription of the lac operon in Escherichia coli (E. 
coli) [Raibaud et al. (1984) Annu. Rev. Genet 18:173]. Regulated expression may therefore be 
either positive or negative, thereby either enhancing or reducing transcription. 

Sequences encoding metabolic pathway enzymes provide particularly useful promoter sequences. 

Examples include promoter sequences derived from sugar metabolizing enzymes, such as galactose, 
10 lactose (lac) [Chang et al. (1977) Nature 198:1056], and maltose. Additional examples include 

promoter sequences derived from biosynthetic enzymes such as tryptophan (trp) [Goeddel et al. 

(1980) Nuc. Acids Res. 5:4057; Yelverton et al. (1981) Nucl. Acids Res. 9:731; US 

patent 4,738,921; EP-A-0036776 and EP-A-0121775]. The g-laotamase (bla) promoter system 

[Weissmann (1981) "The cloning of interferon and other mistakes." ^Interferon 3 (ed. I. Gresser)], 
15 bacteriophage lambda PL [Shimatake et al. (1981) Nature 292:12%] and T5 [US patent 4,689,406] 

promoter systems also provide useful promoter sequences. 

In addition, synthetic promoters which do not occur in nature also function as bacterial promoters. 
For example, transcription activation sequences of one bacterial or bacteriophage promoter may 
be joined with the operon sequences of another bacterial or bacteriophage promoter, creating a 

20 synthetic hybrid promoter [US patent 4,551,433]. For example, the tac promoter is a hybrid trp-lac 
promoter comprised of both trp promoter and lac operon sequences that is regulated by the lac 
repressor [Amann et al. (1983) Gene 25:167; de Boer et al. (1983) Proc. Natl. Acad. Sci. 80:21]. 
Furthermore, a bacterial promoter can include naturally occurring promoters of non-bacterial origin 
that have the ability to bind bacterial RNA polymerase and initiate transcription. A naturally 

25 occurring promoter of non-bacterial origin can also be coupled with a compatible RNA polymerase 
to produce high levels of expression of some genes in prokaryotes. The bacteriophage T7 RNA 
polymerase/promoter system is an example of a coupled promoter system [Studier et al. (1986) J. 
Mol. Biol. 189:113; Tabor et al. (1985) Proc Natl. Acad. Sci. 52:1074]. In addition, a hybrid 
promoter can also be comprised of a bacteriophage promoter and an is. coli operator region (EPO- 

30 A-0 267 851). 
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In addition to a functioning promoter sequence, an efficient ribosome binding site is also useful for 
the expression of foreign genes in prokaryotes. In E. coli, the ribosome binding site is called the 
Shine-Dalgarno (SD) sequence and includes an initiation codon (ATG) and a sequence 3-9 
nucleotides in length located 3-1 1 nucleotides upstream of the initiation codon [Shine et al. (1975) 
5 Nature 254:34]. The SD sequence is thought to promote binding ofmRNA to the ribosome by the 
pairing of bases between the SD sequence and the 3' and of E. coli 16S rRNA [Steitz et al. (1979) 
"Genetic signals and nucleotide sequences in messenger RNA." In Biological Regulation and 
Development: Gene Expression (ed. R.F. Goldberger)]. To express eukaryotic genes and 
prokaryotic genes with weak ribosome-binding site [Sambrook et al. (1989) "Expression of cloned 
10 genes in Escherichia coli." In Molecular Cloning: A Laboratory Manual]. 

A DNA molecule may be expressed intracellularly. A promoter sequence may be directly linked 
with the DNA molecule, in which case the first amino acid at the N-terminus will always be a 
methionine, which is encoded by the ATG start codon. If desired, methionine at the N-terminus 
may be cleaved from the protein by in vitro incubation with cyanogen bromide or by either in vivo 
15 on in vitro incubation with a bacterial methionine N-terminal peptidase (EPO-A-0 219 237). 

Fusion proteins provide an alternative to direct expression. Usually, a DNA sequence encoding the 
N-terminal portion of an endogenous bacterial protein, or other stable protein, is fused to the 5' end 
of heterologous coding sequences. Upon expression, this construct will provide a fusion of the two 
amino acid sequences. For example, the bacteriophage lambda cell gene can be linked at the 5' 

20 terminus of a foreign gene and expressed in bacteria. The resulting fusion protein preferably retains 
a site for a processing enzyme (factor Xa) to cleave the bacteriophage protein from the foreign gene 
[Nagai et al. (1984) Nature 309:810]. Fusion proteins can also be made with sequences from the 
lacZ [Jia et al. (1987) Gene 60:191], trpE [Allen et al. (1987) J. Biotechnol. 5:93; Makoff et al. 
(1989) J. Gen. Microbiol. 135:1 1], and Chey [EP-A-0 324 647] genes. The DNA sequence at the 

25 junction of the two amino acid sequences may or may not encode a cleavable site. Another example 
is a ubiquitin fusion protein. Such a fusion protein is made with the ubiquitin region that preferably 
retains a site for a processing enzyme {eg. ubiquitin specific processing-protease) to cleave the 
ubiquitin from the foreign protein. Through this method, native foreign protein can be isolated 
[Miller et al. (1989) Bio/Technology 7:698]. 
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Alternatively, foreign proteins can also be secreted from the cell by creating chimeric DNA molecules 
that encode a fusion protein comprised of a signal peptide sequence fragment that provides for secretion 
of the foreign protein in bacteria [US patent 4,336,336]. The signal sequence fragment usually encodes 
a signal peptide comprised of hydrophobic amino acids which direct the secretion of the protein from 
5 the cell. The protein is either secreted into the growth media (gram-positive bacteria) or into the 
periplasmic space, located between the inner and outer membrane of the cell (gram-negative bacteria). 
Preferably there are processing sites, which can be cleaved either in vivo or in vitro encoded between the 
signal peptide fragment and the foreign gene. 

DNA encoding suitable signal sequences can be derived from genes for secreted bacterial proteins, 
10 such as the E. coli outer membrane protein gene (ompA) [Masui et al. (1983), in: Experimental 
Manipulation of Gene Expression; Ghrayeb et al. (1984) EMBO J. 5:2437] and the E. coli alkaline 
phosphatase signal sequence (phoA) [Oka et al. (1985) Proc. Natl. Acad. Sci. 82:7212]. As an 
additional example, the signal sequence of the alpha-amylase gene from various Bacillus strains 
can be used to secrete heterologous proteins from B. subtilis [Palva et al. (1982) Proc. Natl Acad. 
15 Sci. USA 79:5582; EP-A-0 244 042]. 

Usually, transcription termination sequences recognized by bacteria are regulatory regions located 
3' to the translation stop codon, and thus together with the promoter flank the coding sequence. 
These sequences direct the transcription of an mRNA which can be translated into the polypeptide 
encoded by the DNA. Transcription termination sequences frequently include DNA sequences of 
20 about 50 nucleotides capable of forming stem loop structures that aid in terminating transcription. 
Examples include transcription termination sequences derived from genes with strong promoters, 
such as the trp gene in E. coli as well as other biosynthetic genes. 

Usually, the above described components, comprising a promoter, signal sequence (if desired), 
coding sequence of interest, and transcription termination sequence, are put together into expression 

25 constructs. Expression constructs are often maintained in a replicon, such as an extrachromosomal 
element (eg. plasmids) capable of stable maintenance in a host, such as bacteria. The replicon will 
have a replication system, thus allowing it to be maintained in a prokaryotic host either for 
expression or for cloning and amplification. In addition, a replicon may be either a high or low 
copy number plasmid. A high copy number plasmid will generally have a copy number ranging 

30 from about 5 to about 200, and usually about 10 to about 150. A host containing a high copy 
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number plasmid will preferably contain at least about 10, and more preferably at least about 20 
plasmids. Either a high or low copy number vector may be selected, depending upon the effect of 
the vector and the foreign protein on the host. 

Alternatively, the expression constructs can be integrated into the bacterial genome with an 
5 integrating vector. Integrating vectors usually contain at least one sequence homologous to the 
bacterial chromosome that allows the vector to integrate. Integrations appear to result from 
recombinations between homologous DNA in the vector and the bacterial chromosome. For 
example, integrating vectors constructed with DNA from various Bacillus strains integrate into the 
Bacillus chromosome (EP-A- 0 127 328). Integrating vectors may also be comprised of 
10 bacteriophage or transposon sequences. 

Usually, extrachromosomal and integrating expression constructs may contain selectable markers 
to allow for the selection of bacterial strains that have been transformed. Selectable markers can 
be expressed in the bacterial host and may include genes which render bacteria resistant to drugs 
such as ampicillin, chloramphenicol, erythromycin, kanamycin (neomycin), and tetracycline 
15 [Davies et al. (1978) Annu. Rev. Microbiol. 32:469]. Selectable markers may also include 
biosynthetic genes, such as those in the histidine, tryptophan, and leucine biosynthetic pathways. 

Alternatively, some of the above described components can be put together in transformation 
vectors. Transformation vectors are usually comprised of a selectable market that is either 
maintained in a replicon or developed into an integrating vector, as described above. 

20 Expression and transformation vectors, either extra-chromosomal replicons or integrating vectors, 
have been developed for transformation into many bacteria. For example, expression vectors have 
been developed for, inter alia, the following bacteria: Bacillus subtilis [Palva et al. (1982) Proc. 
Natl. Acad. Sci. USA 79:5582; EP-A-0 036 259 andEP-A-0 063 953; WO 84/04541], Escherichia 
coli [Shimatake et al. (1981) Nature 292:128; Amann et al. (1985) Gene 40:183; Studier et al. 

25 (1986) J. Mol. Biol. 189:1X3; EP-A-0 036 776,EP-A-0 136 829 and EP-A-0 136 907], 
Streptococcus cremoris [Powell et al. (1988) Appl. Environ. Microbiol. 54:655]; Streptococcus 
lividans [Powell et al. (1988) Appl. Environ. Microbiol. 54:655], Streptomyces lividans [US patent 
4,745,056]. 
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Methods of introducing exogenous DNA into bacterial hosts are well-known in the art, and usually 
include either the transformation of bacteria treated with CaCl 2 or other agents, such as divalent 
cations and DMSO. DNA can also be introduced into bacterial cells by electroporation. 
Transformation procedures usually vary with the bacterial species to be transformed. See eg. 
5 [Masson et al. (1989) FEMS Microbiol. Lett. 60:213; Palva et al. (1982) Proc. Natl Acad. Sci. USA 
79:5582; EP-A-0 036 259 and EP-A-0 063 953; WO 84/04541, Bacillus], [Miller et al. (1988) 
Proc. Natl. Acad. Sci. 55:856; Wang et al. (1990)/. Bacterial 172:949, Campylobacter], [Cohen 
et al. (1973) Proc. Natl. Acad. Sci. 69:2110; Dower et al. (1988) Nucleic Acids Res. 16:6121; 
Kushner (1978) "An improved method for transformation of Escherichia coli with ColEl-derived 

10 plasmids. In Genetic Engineering: Proceedings of the International Symposium on Genetic 
Engineering (eds. H.W. Boyer and S. Nicosia); Mandel et al. (1970) J. Mol. Biol. 53:159; Taketo 
(1988) Biochim. Biophys. Acta 949:31S; Escherichia], [Chassy et al. (1987) FEMS Microbiol. Lett. 
44:113 Lactobacillus]; [Fiedler et al. (1988) Anal. Biochem 770:38, Pseudomonas]; [Augustin et 
al. (1990) FEMS Microbiol. Lett. 66:203, Staphylococcus], [Barany et al. (1980) J. Bacteriol. 

15 144:698; Harlander (1987) "Transformation of Streptococcus lactis by electroporation, in: 
Streptococcal Genetics (ed. J. Ferretti and R. Curtiss III); Perry et al. (1981) Infect. Immun. 
32:1295; Powell et al. (1988) Appl Environ. Microbiol. 54:655; Somkuti et al. (1987) Proc. 4th 
Evr. Cong. Biotechnology 7:412, Streptococcus]. 
v. Yeast Expression 

20 Yeast expression systems are also known to one of ordinary skill in the art. A yeast promoter is any 
DNA sequence capable of binding yeast RNA polymerase and initiating the downstream (3') 
transcription of a coding sequence {eg. structural gene) into mRNA. A promoter will have a 
transcription initiation region which is usually placed proximal to the 5' end of the coding sequence. 
This transcription initiation region usually includes an RNA polymerase binding site (the "TATA 

25 Box") and a transcription initiation site. A yeast promoter may also have a second domain called 
an upstream activator sequence (UAS), which, if present, is usually distal to the structural gene. 
The UAS permits regulated (inducible) expression. Constitutive expression occurs in the absence 
of a UAS. Regulated expression may be either positive or negative, thereby either enhancing or 
reducing transcription. 

30 Yeast is a fermenting organism with an active metabolic pathway, therefore sequences encoding 
enzymes in the metabolic pathway provide particularly useful promoter sequences. Examples 
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include alcohol dehydrogenase (ADH) (EP-A-0 284 044), enolase, glucokinase, glucose-6- 
phosphate isomerase, glyceraldehyde-3-phosphate-dehydrogenase (GAP or GAPDH), hexokinase, 
phosphofructokinase, 3-phosphoglycerate mutase, and pyruvate kinase (PyK) (EPO-A-0 329 203). 
The yeast PH05 gene, encoding acid phosphatase, also provides useful promoter sequences 
5 [Myanohara et al. (1983) Proc. Natl. Acad. Set USA 80:1]. 

In addition, synthetic promoters which do not occur in nature also function as yeast promoters. For 
example, UAS sequences of one yeast promoter may be joined with the transcription activation 
region of another yeast promoter, creating a synthetic hybrid promoter. Examples of such hybrid 
promoters include the ADH regulatory sequence linked to the GAP transcription activation region 

10 (US Patent Nos. 4,876,197 and 4,880,734). Other examples of hybrid promoters include promoters 
which consist of the regulatory sequences of either the ADH2, GAL4, GAL 10, OR PH05 genes, 
combined with the transcriptional activation region of a glycolytic enzyme gene such as GAP or 
PyK (EP-A-0 164 556). Furthermore, a yeast promoter can include naturally occurring promoters 
of non-yeast origin that have the ability to bind yeast RNA polymerase and initiate transcription. 

15 Examples of such promoters include, inter alia, [Cohen et al. (1980) Proc. Natl. Acad. Sci. USA 
77:1078; Henikoff et al. (1981) Nature 255:835; Hollenberg et al. (1981) Curr. Topics Microbiol. 
Immunol. 96:119; Hollenberg et al. (1979) "The Expression of Bacterial Antibiotic Resistance 
Genes in the Yeast Saccharomyces cerevisiae," in: Plasmids of Medical, Environmental and 
Commercial Importance (eds. K.N. Timmis and A. Puhler); Mercerau-Puigalon et al. (1980) Gene 

20 ii:163; Panthier et al. (1980) Curr. Genet. 2:109;]. 

A DNA molecule may be expressed intracellularly in yeast. A promoter sequence may be directly 
linked with the DNA molecule, in which case the first amino acid at the N-terminus of the 
recombinant protein will always be a methionine, which is encoded by the ATG start codon. If 
desired, methionine at the N-terminus may be cleaved from the protein by in vitro incubation with 
25 cyanogen bromide. 

Fusion proteins provide an alternative for yeast expression systems, as well as in mammalian, 
baculovirus, and bacterial expression systems. Usually, a DNA sequence encoding the N-terminal 
portion of an endogenous yeast protein, or other stable protein, is fused to the 5' end of 
heterologous coding sequences. Upon expression, this construct will provide a fusion of the two 
30 amino acid sequences. For example, the yeast or human superoxide dismutase (SOD) gene, can be 
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linked at the 5' terminus of a foreign gene and expressed in yeast. The DNA sequence at the 
junction of the two amino acid sequences may or may not encode a cleavable site. See eg. EP-A-0 
196 056. Another example is a ubiquitin fusion protein. Such a fusion protein is made with the 
ubiquitin region that preferably retains a site for a processing enzyme (eg. ubiquitin-specific 
5 processing protease) to cleave the ubiquitin from the foreign protein. Through this method, 
therefore, native foreign protein can be isolated (eg. WO88/024066). 

Alternatively, foreign proteins can also be secreted from the cell into the growth media by creating 
chimeric DNA molecules that encode a fusion protein comprised of a leader sequence fragment that 
provide for secretion in yeast of the foreign protein. Preferably, there are processing sites encoded 
10 between the leader fragment and the foreign gene that can be cleaved either in vivo or in vitro. The 
leader sequence fragment usually encodes a signal peptide comprised of hydrophobic amino acids 
which direct the secretion of the protein from the cell. 

DNA encoding suitable signal sequences can be derived from genes for secreted yeast proteins, 
such as the yeast invertase gene (EP-A-0 012 873; JPO. 62,096,086) and the A-factor gene (US 
1 5 patent 4,588,684). Alternatively, leaders of non-yeast origin, such as an interferon leader, exist that 
also provide for secretion in yeast (EP-A-0 060 057). 

A preferred class of secretion leaders are those that employ a fragment of the yeast alpha- factor 
gene, which contains both a "pre" signal sequence, and a "pro" region. The types of alpha- factor 
fragments that can be employed include the full-length pre-pro alpha factor leader (about 83 amino 
20 acid residues) as well as truncated alpha-factor leaders (usually about 25 to about 50 amino acid 
residues) (US Patents 4,546,083 and 4,870,008; EP-A-0 324 274). Additional leaders employing 
an alpha-factor leader fragment that provides for secretion include hybrid alpha-factor leaders made 
with a presequence of a first yeast, but a pro-region from a second yeast alphafactor. (eg. see WO 
89/02463.) 

25 Usually, transcription termination sequences recognized by yeast are regulatory regions located 3' 
to the translation stop codon, and thus together with the promoter flank the coding sequence. These 
sequences direct the transcription of an mRNA which can be translated into the polypeptide 
encoded by the DNA. Examples of transcription terminator sequence and other yeast-recognized 
termination sequences, such as those coding for glycolytic enzymes. 
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Usually, the above described components, comprising a promoter, leader (if desired), coding 
sequence of interest, and transcription termination sequence, are put together into expression 
constructs. Expression constructs are often maintained in a replicon, such as an extrachromosomal 
element (eg. plasmids) capable of stable maintenance in a host, such as yeast or bacteria. The 
5 replicon may have two replication systems, thus allowing it to be maintained, for example, in yeast 
for expression and in a prokaryotic host for cloning and amplification. Examples of such yeast- 
bacteria shuttle vectors include YEp24 [Botstein et al. (1979) Gene 5:17-24], pCl/1 [Brake et al. 
(1984) Proc. Natl. Acad. Sci USA 81:4642-4646], and YRpl7 [Stinchcomb et al. (1982) J. Mol. 
Biol. 158:151]. In addition, a replicon may be either a high or low copy number plasmid. Ahigh 
10 copy number plasmid will generally have a copy number ranging from about 5 to about 200, and 
usually about 10 to about 150. A host containing a high copy number plasmid will preferably have 
at least about 10, and more preferably at least about 20. Enter a high or low copy number vector 
may be selected, depending upon the effect of the vector and the foreign protein on the host. See 
eg. Brake et al., supra. 

1 5 Alternatively, the expression constructs can be integrated into the yeast genome with an integrating 
vector. Integrating vectors usually contain at least one sequence homologous to a yeast 
chromosome that allows the vector to integrate, and preferably contain two homologous sequences 
flanking the expression construct. Integrations appear to result from recombinations between 
homologous DNA in the vector and the yeast chromosome [Orr- Weaver et al. (1983) Methods in 

20 Enzymol. 1 07:228-245]. An integrating vector may be directed to a specific locus in yeast by 
selecting the appropriate homologous sequence for inclusion in the vector. See Orr- Weaver et al, 
supra. One or more expression construct may integrate, possibly affecting levels of recombinant 
protein produced [Rine et al. (1983) Proc. Natl. Acad. Sci. USA 80:6750]. The chromosomal 
sequences included in the vector can occur either as a single segment in the vector, which results 

25 in the integration of the entire vector, or two segments homologous to adjacent segments in the 
chromosome and flanking the expression construct in the vector, which can result in the stable 
integration of only the expression construct. 

Usually, extrachromosomal and integrating expression constructs may contain selectable markers 
to allow for the selection of yeast strains that have been transformed. Selectable markers may 
30 include biosynthetic genes that can be expressed in the yeast host, such as ADE2, HIS4, LEU2, 
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TRP1, and ALG7, and the G418 resistance gene, which confer resistance in yeast cells to 
tunicamycin and G418, respectively. In addition, a suitable selectable marker may also provide 
yeast with the ability to grow in the presence of toxic compounds, such as metal. For example, the 
presence of CUP1 allows yeast to grow in the presence of copper ions [Butt et al. (1987) Microbiol, 
5 Rev. 57:351]. 

Alternatively, some of the above described components can be put together into transformation 
vectors. Transformation vectors are usually comprised of a selectable marker that is either 
maintained in a replicon or developed into an integrating vector, as described above. 

Expression and transformation vectors, either extrachromosomal replicons or integrating vectors, 
10 have been developed for transformation into many yeasts. For example, expression vectors have 

been developed for, inter alia, the following yeasts:Candida albicans [Kurtz, et al. (1986) Mol. 

Cell. Biol. 6:142], Candida maltosa [Kunze, et al. (1985) J. Basic Microbiol. 25:141]. Hansenula 

polymorphs [Gleeson, et al. (1986) J. Gen. Microbiol. 132:3459; Roggenkamp et al. (1986) Mol. 

Gen. Genet. 202:302], Kluyveromyces fragilis [Das, et al. (1984) J. Bacteriol. 755:1165], 
15 Kluyveromyces lactis [De Louvencourt et al. (1983) J. Bacteriol 154:131; Van den Berg et al. 

(1990) Bio/Technology 8:135], Pichia guillerimondii [Kunze et al. (1985) J. Basic Microbiol. 

25:141], Pichia pastoris [Cregg, et al. (1985) Mol. Cell. Biol. 5:3316; US Patent Nos. 4,837,148 

and 4,929,555], Saccharomyces cerevisiae [Hinnen et al. (1978) Proc. Natl. Acad. Sci. USA 

75:1929; Ito et al. (1983) J. Bacteriol. i53:163], Schizosaccharomyces pombe [Beach and Nurse 
20 (1981) Nature 300:106], and Yarrowia lipolytica [Davidow, et al. (1985) Curr. Genet. 70:380471 

Gaillardin, et al. (1985) Curr. Genet. 10:49]. 

Methods of introducing exogenous DNA into yeast hosts are well-known in the art, and usually 

include either the transformation of spheroplasts or of intact yeast cells treated with alkali cations. 

Transformation procedures usually vary with the yeast species to be transformed. See eg. [Kurtz 
25 et al. (1986) Mol. Cell. Biol. 6:142; Kunze et al. (1985) J. Basic Microbiol. 25:141; Candida]; 

[Gleeson^ al. (1986) J. Gen. Microbiol. 132:3459; Roggenkamp et al. (1986) Mol. Gen. Genet. 

202:302; Hansenula]; [Das et al. (1984) J. Bacteriol. 158: 1 165; De Louvencourt et al. (1983) J. 

Bacteriol. 154:1 165; Van den Berg et al. (1990) Bio/Technology 5:135; Kluyveromyces]; [Cregg 

et al. (1985) Mol. Cell. Biol. 5:3376; Kunze et al. (1985) J. Basic Microbiol. 25:141; US Patent 
30 Nos. 4,837,148 and 4,929,555; Pichia]; [Hinnen et al. (1978) Proc. Natl. Acad. Sci. USA 75;\929; 
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Ito et al. (1983) J. Bacteriol. 153:163 Saccharomyces]; [Beach and Nurse (1981) Nature 300:706; 
Schizosaccharomyces]; [Davidow et al. (1985) Curr. Genet. 10:39; Gaillardin et al. (1985) Curr. 
Genet. 70:49; Yarrowia]. 

Antibodies 

5 As used herein, the term "antibody" refers to a polypeptide or group of polypeptides composed of 
at least one antibody combining site. An "antibody combining site" is the three-dimensional 
binding space with an internal surface shape and charge distribution complementary to the features 
of an epitope of an antigen, which allows a binding of the antibody with the antigen. "Antibody" 
includes, for example, vertebrate antibodies, hybrid antibodies, chimeric antibodies, humanised 
1 0 antibodies, altered antibodies, univalent antibodies, Fab proteins, and single domain antibodies. 

Antibodies against the proteins of the invention are useful for affinity chromatography, 
immunoassays, and distinguishing/identifying Neisserial proteins. 

Antibodies to the proteins of the invention, both polyclonal and monoclonal, may be prepared by 
conventional methods. In general, the protein is first used to immunize a suitable animal, preferably 

15 a mouse, rat, rabbit or goat. Rabbits and goats are preferred for the preparation of polyclonal sera 
due to the volume of serum obtainable, and the availability of labeled anti-rabbit and anti-goat 
antibodies. Immunization is generally performed by mixing or emulsifying the protein in saline, 
preferably in an adjuvant such as Freund's complete adjuvant, and injecting the mixture or 
emulsion parenterally (generally subcutaneously or intramuscularly). A dose of 50-200 pg/injection 

20 is typically sufficient. Immunization is generally boosted 2-6 weeks later with one or more 
injections of the protein in saline, preferably using Freund's incomplete adjuvant. One may 
alternatively generate antibodies by in vitro immunization using methods known in the art, which 
for the purposes of this invention is considered equivalent to in vivo immunization. Polyclonal 
antisera is obtained by bleeding the immunized animal into a glass or plastic container, incubating 

25 the blood at 25°C for one hour, followed by incubating at 4°C for 2-18 hours. The serum is 
recovered by centrifugation {eg. 1 ,000g- for 1 0 minutes). About 20-50 ml per bleed may be obtained 
from rabbits. 

Monoclonal antibodies are prepared using the standard method of Kohler & Milstein {Nature 
(1975) 256:495-96], or a modification thereof. Typically, a mouse or rat is immunized as described 
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above. However, rather than bleeding the animal to extract serum, the spleen (and optionally 
several large lymph nodes) is removed and dissociated into single cells. If desired, the spleen cells 
may be screened (after removal of nonspecifically adherent cells) by applying a cell suspension to 
a plate or well coated with the protein antigen. B -cells expressing membrane-bound 
5 immunoglobulin specific for the antigen bind to the plate, and are not rinsed away with the rest of 
the suspension. Resulting B-cells, or all dissociated spleen cells, are then induced to fuse with 
myeloma cells to form hybridomas, and are cultured in a selective medium (eg. hypoxanthine, 
aminopterin, mymidine medium, "HAT"). The resulting hybridomas are plated by limiting dilution, 
and are assayed for the production of antibodies which bind specifically to the immunizing antigen 
10 (and which do not bind to unrelated antigens). The selected MAb-secreting hybridomas are then 
cultured either in vitro (eg. in tissue culture bottles or hollow fiber reactors), or in vivo (as ascites 
in mice). 

If desired, the antibodies (whether polyclonal or monoclonal) may be labeled using conventional 
techniques. Suitable labels include fluorophores, chromophores, radioactive atoms (particularly 32 P 

15 and I25 I), electron-dense reagents, enzymes, and ligands having specific binding partners. Enzymes 
are typically detected by their activity. For example, horseradish peroxidase is usually detected by 
its ability to convert 3,3',5,5'-tetramethylbenzidine (TMB) to a blue pigment, quantifiable with a 
spectrophotometer. "Specific binding partner" refers to a protein capable of binding a ligand 
molecule with high specificity, as for example in the case of an antigen and a monoclonal antibody 

20 specific therefor. Other specific binding partners include biotin and avidin or streptavidin, IgG and 
protein A, and the numerous receptor-ligand couples known in the art. It should be understood that 
the above description is not meant to categorize the various labels into distinct classes, as the same 
label may serve in several different modes. For example, 12i I may serve as a radioactive label or as 
an electron-dense reagent. HRP may serve as enzyme or as antigen for a MAb. Further, one may 

25 combine various labels for desired effect. For example, MAbs and avidin also require labels in the 
practice of this invention: thus, one might label a MAb with biotin, and detect its presence with 
avidin labeled with 125 I, or with an anti-biotin MAb labeled with HRP. Other permutations and 
possibilities will be readily apparent to those of ordinary skill in the art, and are considered as 
equivalents within the scope of the instant invention. 
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Pharmaceutical Compositions 

Pharmaceutical compositions can comprise either polypeptides, antibodies, or nucleic acid of the 
invention. The pharmaceutical compositions will comprise a therapeutically effective amount of 
either polypeptides, antibodies, or polynucleotides of the claimed invention. 

5 The term "therapeutically effective amount" as used herein refers to an amount of a therapeutic 
agent to treat, ameliorate, or prevent a desired disease or condition, or to exhibit a detectable 
therapeutic or preventative effect. The effect can be detected by, for example, chemical markers or 
antigen levels. Therapeutic effects also include reduction in physical symptoms, such as decreased 
body temperature. The precise effective amount for a subject will depend upon the subject's size 

10 and health, the nature and extent of the condition, and the therapeutics or combination of 
therapeutics selected for administration. Thus, it is not useful to specify an exact effective amount 
in advance. However, the effective amount for a given situation can be determined by routine 
experimentation and is within the judgement of the clinician. 

For purposes of the present invention, an effective dose will be from about 0.01 mg/ kg to 50 mg/kg 
15 or 0.05 mg/kg to about 10 mg/kg of the DNA constructs in the individual to which it is 
administered. 

A pharmaceutical composition can also contain a pharmaceutically acceptable carrier. The term 
"pharmaceutically acceptable carrier" refers to a carrier for administration of a therapeutic agent, 
such as antibodies or a polypeptide, genes, and other therapeutic agents. The term refers to any 

20 pharmaceutical carrier that does not itself induce the production of antibodies harmful to the 
individual receiving the composition, and which may be administered without undue toxicity. 
Suitable carriers may be large, slowly metabolized macromolecules such as proteins, 
polysaccharides, polylactic acids, polyglycolic acids, polymeric amino acids, amino acid 
copolymers, and inactive virus particles. Such carriers are well known to those of ordinary skill in 

25 the art. 

Pharmaceutically acceptable salts can be used therein, for example, mineral acid salts such as 
hydrochlorides, hydrobromides, phosphates, sulfates, and the like; and the salts of organic acids 
such as acetates, propionates, malonates, benzoates, and the like. A thorough discussion of 
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pharmaceutically acceptable excipients is available in Remington's Pharmaceutical Sciences (Mack 
Pub. Co.,NJ. 1991). 

Pharmaceutically acceptable carriers in therapeutic compositions may contain liquids such as water, 
saline, glycerol and ethanol. Additionally, auxiliary substances, such as wetting or emulsifying 
5 agents, pH buffering substances, and the like, may be present in such vehicles. Typically, the 
therapeutic compositions are prepared as injectables, either as liquid solutions or suspensions; solid 
forms suitable for solution in, or suspension in, liquid vehicles prior to injection may also be 
prepared. Liposomes are included within the definition of a pharmaceutically acceptable carrier. 

Delivery Methods 

1 0 Once formulated, the compositions of the invention can be adrninistered directly to the subject. The 
subjects to be treated can be animals; in particular, human subjects can be treated. 

Direct delivery of the compositions will generally be accomplished by injection, either 
subcutaneously, intraperitoneally, intravenously or intramuscularly or delivered to the interstitial 
space of a tissue. The compositions can also be administered into a lesion. Other modes of 
15 administration include oral and pulmonary administration, suppositories, and transdermal or 
transcutaneous applications (eg. see WO98/20734), needles, and gene guns or hyposprays. Dosage 
treatment may be a single dose schedule or a multiple dose schedule. 

Vaccines 

Vaccines according to the invention may either be prophylactic (Je. to prevent infection) or 
20 therapeutic (ie. to treat disease after infection). 

Such vaccines comprise immunising antigen(s), immunogen(s), polypeptide(s), protein(s) or nucleic acid, 
usually in combination with "pharmaceutically acceptable carriers," which include any carrier that does 
not itself induce the production of antibodies harmful to the individual receiving the composition. 
Suitable carriers are typically large, slowly metabolized macromolecules such as proteins, 
25 polysaccharides, polylactic acids, polyglycolic acids, polymeric amino acids, amino acid copolymers, 
lipid aggregates (such as oil droplets or liposomes), and inactive virus particles. Such carriers are well 
known to those of ordinary skill in the art. Additionally, these carriers may function as 
immunostimulating agents ("adjuvants"). Furthermore, the antigen or immunogen may be conjugated 
to a bacterial toxoid, such as a toxoid from diphtheria, tetanus, cholera, H. pylori, etc. pathogens. 
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Preferred adjuvants to enhance effectiveness of the composition include, but are not limited to: (1) 
aluminum salts (alum), such as aluminum hydroxide, aluminum phosphate, aluminum sulfate, etc; 
(2) oil-in-water emulsion formulations (with or without other specific irnmunostimulating agents 
such as muramyl peptides (see below) or bacterial cell wall components), such as for example (a) 
5 MF59™ (WO 90/14837; Chapter 10 in Vaccine design: the subunit and adjuvant approach, eds. 
Powell & Newman, Plenum Press 1995), containing 5% Squalene, 0.5% Tween 80, and 0.5% Span 
85 (optionally containing various amounts of MTP-PE (see below), although not required) 
formulated into submicron particles using a microfluidizer such as Model HOY microfluidizer 
(Microfluidics, Newton, MA), (b) SAF, containing 10% Squalane, 0.4% Tween 80, 5% pluronic- 

1 0 blocked polymer L 1 2 1 , and thr-MDP (see below) either microfluidized into a submicron emulsion 
or vortexed to generate a larger particle size emulsion, and (c) Ribi™ adjuvant system (RAS), (Ribi 
bnmunochem, Hamilton, MT) containing 2% Squalene, 0.2% Tween 80, and one or more bacterial 
cell wall components from the group consisting of monophosphorylipid A (MPL), trehalose 
dimycolate (TDM), and cell wall skeleton (CWS), preferably MPL + CWS (Detox™); (3) saponin 

15 adjuvants, such as Stimulon™ (Cambridge Bioscience, Worcester, MA) may be used or particles 
generated therefrom such as ISCOMs (irnmunostimulating complexes); (4) Complete Freund's 
Adjuvant (CFA) and Incomplete Freund's Adjuvant (IF A); (5) cytokines, such as interleukins (eg. 
IL-1, IL-2, IL-4, IL-5, IL-6, IL-7, IL-12, etc.), interferons (eg. gamma interferon), macrophage 
colony stimulating factor (M-CSF), tumor necrosis factor (TNF), etc; and (6) other substances that 

20 act as irnmunostimulating agents to enhance the effectiveness of the composition. Alum and 
MF59™ are preferred. 

As mentioned above, muramyl peptides include, but are not limited to, N-acetyl-muramyl-L- 
threonyl-D-isoglutamine (thr-MDP), N-acetyl-normuramyl-L-alanyl-D-isoglutamine (nor-MDP), 
N-acetylmuramyl-L-alanyl-D-isoglutaminyl-L-alanine-2-( 1 '-2'-dipalmitoyl-^n-glycero-3 - 
25 hydroxyphosphoryloxy)-ethylamine (MTP-PE), etc. 

The immunogenic compositions (eg. the immunising antigen/immunogen/polypeptide/protein/ 
nucleic acid, pharmaceutically acceptable carrier, and adjuvant) typically will contain diluents, such 
as water, saline, glycerol, ethanol, etc. Additionally, auxiliary substances, such as wetting or 
emulsifying agents, pH buffering substances, and the like, may be present in such vehicles. 
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Typically, the immunogenic compositions are prepared as injectables, either as liquid solutions or 
suspensions; solid forms suitable for solution in, or suspension in, liquid vehicles prior to injection 
may also be prepared. The preparation also may be emulsified or encapsulated in liposomes for 
enhanced adjuvant effect, as discussed above under pharmaceutically acceptable carriers. 

5 Immunogenic compositions used as vaccines comprise an immunologically effective amount of the 
antigenic or immunogenic polypeptides, as well as any other of the above-mentioned components, 
as needed. By "immunologically effective amount", it is meant that the administration of that 
amount to an individual, either in a single dose or as part of a series, is effective for treatment or 
prevention. This amount varies depending upon the health and physical condition of the individual 
10 to be treated, the taxonomic group of individual to be treated (eg. nonhuman primate, primate, etc.), 
the capacity of the individual's immune system to synthesize antibodies, the degree of protection 
desired, the formulation of the vaccine, the treating doctor's assessment of the medical situation, 
and other relevant factors. It is expected that the amount will fall in a relatively broad range that 
can be determined through routine trials. 

15 The immunogenic compositions are conventionally administered parenterally, eg. by injection, 
either subcutaneously, intramuscularly, or transdermally/transcutaneously (eg. WO98/20734). 
Additional formulations suitable for other modes of administration include oral and pulmonary 
formulations, suppositories, and transdermal applications. Dosage treatment may be a single dose 
schedule or a multiple dose schedule. The vaccine may be administered in conjunction with other 

20 immunoregulatory agents. 

As an alternative to protein-based vaccines, DNA vaccination may be employed [eg. Robinson & 
Torres (1997) Seminars in Immunology 9:271-283; Donnelly et ah (1997) Annu Rev Immunol 
15:617-648; see later herein]. 

Gene Delivery Vehicles 

25 Gene therapy vehicles for delivery of constructs including a coding sequence of a therapeutic of 
the invention, to be delivered to the mammal for expression in the mammal, can be administered 
either locally or systemically. These constructs can utilize viral or non- viral vector approaches in 
in vivo or ex vivo modality. Expression of such coding sequence can be induced using endogenous 
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mammalian or heterologous promoters. Expression of the coding sequence in vivo can be either 
constitutive or regulated. 

The invention includes gene delivery vehicles capable of expressing the contemplated nucleic acid 
sequences. The gene delivery vehicle is preferably a viral vector and, more preferably, a retroviral, 
5 adenoviral, adeno-associated viral (AAV), herpes viral, or alphavirus vector. The viral vector can 
also be an astrovirus, coronavirus, orthomyxovirus, papovavirus, paramyxovirus, parvovirus, 
picornavirus, poxvirus, or togavirus viral vector. See generally, Jolly (1994) Cancer Gene Therapy 
1:51-64; Kimura (1994) Human Gene Therapy 5:845-852; Connelly (1995) Human Gene Therapy 
6:185-193; and Kaplitt (1994) Nature Genetics 6:148-153. 

10 Retroviral vectors are well known in the art and we contemplate that any retroviral gene therapy 
vector is employable in the invention, including B, C and D type retroviruses, xenotropic 
retroviruses (for example, NZB-X1, NZB-X2 and NZB9-1 (see O'Neill (1985) J. Virol. 53:160) 
polytropic retroviruses eg. MCF and MCF-MLV (see Kelly (1983) J. Virol. 45:291), spumaviruses 
and lentiviruses. See RNA Tumor Viruses, Second Edition, Cold Spring Harbor Laboratory, 1985. 

15 Portions of the retroviral gene therapy vector may be derived from different retroviruses. For 
example, retrovector LTRs may be derived from a Murine Sarcoma Virus, a tRNA binding site 
from a Rous Sarcoma Virus, a packaging signal from a Murine Leukemia Virus, and an origin of 
second strand synthesis from an Avian Leukosis Virus. 

These recombinant retroviral vectors may be used to generate transduction competent retroviral 
20 vector particles by introducing them into appropriate packaging cell lines (see US patent 
5,591,624). Retrovirus vectors can be constructed for site-specific integration into host cell DNA 
by incorporation of a chimeric integrase enzyme into the retroviral particle (see W096/37626). It 
is preferable that the recombinant viral vector is a replication defective recombinant virus. 

Packaging cell lines suitable for use with the above-described retrovirus vectors are well known 
25 in the art, are readily prepared (see WO95/30763 and WO92/05266), and can be used to create 
producer cell lines (also termed vector cell lines or "VCLs") for the production of recombinant 
vector particles. Preferably, the packaging cell lines are made from human parent cells {eg. HT1080 
cells) or mink parent cell lines, which eliminates inactivation in human serum. 
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Preferred retroviruses for the construction of retroviral gene therapy vectors include Avian 
Leukosis Virus, Bovine Leukemia, Virus, Murine Leukemia Virus, Mink-Cell Focus-Inducing 
Virus, Murine Sarcoma Virus, Reticuloendotheliosis Virus and Rous Sarcoma Virus. Particularly 
preferred Murine Leukemia Viruses include 4070A and 15 04 A (Hartley and Rowe (1976) J Virol 
5 19:19-25), Abelson (ATCC No. VR-999), Friend (ATCC No. VR-245), Graffi, Gross (ATCC Nol 
VR-590), Kirsten, Harvey Sarcoma Virus and Rauscher (ATCC No. VR-998) and Moloney Murine 
Leukemia Virus (ATCC No. VR-190). Such retroviruses may be obtained from depositories or 
collections such as the American Type Culture Collection ("ATCC") in Rockville, Maryland or 
isolated from known sources using commonly available techniques. 

10 Exemplary known retroviral gene therapy vectors employable in this invention include those 
described in patent applications GB2200651, EP0415731, EP0345242, EP0334301, WO89/02468; 
WO89/05349, WO89/09271, WO90/02806, WO90/07936, WO94/03622, W093/25698, 
W093/25234, WO93/11230, WO93/10218, WO91/02805, WO91/02825, WO95/07994, US 
5,219,740, US 4,405,712, US 4,861,719, US 4,980,289, US 4,777,127, US 5,591,624. See also Vile 

15 (1993) Cancer Res 53:3860-3864; Vile (1993) Cancer Res 53:962-967; Ram (1993) Cancer Res 
53 (1993) 83-88; Takamiya (1992) J Neurosci Res 33:493-503; Baba (1993) J Neurosurg 
79:729-735; Mann (1983) Cell 33:153; Cane (1984) Proc Natl Acad Sci 81:6349; and Miller (1990) 
Human Gene Therapy 1 . 

Human adenoviral gene therapy vectors are also known in the art and employable in this invention. 

20 See, for example, Berkner (1988) Biotechniques 6:616 and Rosenfeld (1991) Science 252:431, and 
WO93/07283, WO93/06223, and WO93/07282. Exemplary known adenoviral gene therapy vectors 
employable in this invention include those described in the above referenced documents and in 
W094/12649, WO93/03769, W093/19191, W094/28938, W095/11984, WO95/00655, 
WO95/27071, W095/29993, W095/34671, WO96/05320, WO94/08026, WO94/11506, 

25 WO93/06223, W094/24299, WO95/14102, W095/24297, WO95/02697, W094/28152, 
W094/24299, WO95/09241, WO95/25807, WO95/05835, W094/18922 and WO95/09654. 
Alternatively, administration of DNA linked to killed adenovirus as described in Curiel (1992) 
Hum. Gene Ther. 3:147-154 may be employed. The gene delivery vehicles of the invention also 
include adenovirus associated virus (AAV) vectors. Leading and preferred examples of such 

30 vectors for use in this invention are the AAV-2 based vectors disclosed in Srivastava, 
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WO93/09239. Most preferred AAV vectors comprise the two AAV inverted terminal repeats in 
which the native D-sequences are modified by substitution of nucleotides, such that at least 5 native 
nucleotides and up to 18 native nucleotides, preferably at least 10 native nucleotides up to 18 native 
nucleotides, most preferably 10 native nucleotides are retained and the remaining nucleotides of 
5 the D-sequence are deleted or replaced with non-native nucleotides. The native D-sequences of the 
AAV inverted terminal repeats are sequences of 20 consecutive nucleotides in each AAV inverted 
terminal repeat (z'e. there is one sequence at each end) which are not involved in HP formation. The 
non-native replacement nucleotide may be any nucleotide other than the nucleotide found in the 
native D-sequence in the same position. Other employable exemplary AAV vectors are pWP-19, 

10 pWN-1, both of which are disclosed in Nahreini (1993) Gene 124:257-262. Another example of 
such an AAV vector is psub201 (see Samulski (1987) J. Virol. 61 :3096). Another exemplary AAV 
vector is the Double-D ITR vector. Construction of the Double-D ITR vector is disclosed in US 
Patent 5,478,745. Still other vectors are those disclosed in Carter US Patent 4,797,368 and 
Muzyczka US Patent 5,139,941, Chartejee US Patent 5,474,935, and Kotin W094/288157. Yet a 

15 further example of an AAV vector employable in this invention is SSV9AFABTKneo, which 
contains the AFP enhancer and albumin promoter and directs expression predominantly in the liver. 
Its structure and construction are disclosed in Su (1996) Human Gene Therapy 1-A63-A1Q. 
Additional AAV gene therapy vectors are described in US 5,354,678, US 5,173,414, US 5, 1 39,941 , 
and US 5,252,479. 

20 The gene therapy vectors of the invention also include herpes vectors. Leading and preferred 
examples are herpes simplex virus vectors containing a sequence encoding a thymidine kinase 
polypeptide such as those disclosed in US 5,288,641 and EP0176170 (Roizman). Additional 
exemplary herpes simplex virus vectors include HFEM/ICP6-LacZ disclosed in WO95/04139 
(Wistar Institute), pHSVlac described in Geller (1 988) Science 241 : 1 667-1 669 and in WO90/09441 

25 and WO92/07945, HSV Us3::pgC-lacZ described in Fink (1992) Human Gene Therapy 3: 11-19 
and HSV 7134, 2 RH 105 and GAL4 described in EP 0453242 (Breakefield), and those deposited 
with the ATCC as accession numbers ATCC VR-977 and ATCC VR-260. 

Also contemplated are alpha virus gene therapy vectors that can be employed in this invention. 
Preferred alpha virus vectors are Sindbis viruses vectors. Togaviruses, Semliki Forest virus (ATCC 
30 VR-67; ATCC VR-1247), Middleberg virus (ATCC VR-370), Ross River virus (ATCC VR-373; 
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ATCC VR-1246), Venezuelan equine encephalitis virus (ATCC VR923; ATCC VR-1250; ATCC 
VR-1249; ATCC VR-532), and those described in US patents 5,091,309, 5,217,879, and 
WO92/10578. More particularly, those alpha virus vectors described in US Serial No. 08/405,627, 
filed March 15, 1995,W094/21792, WO92/10578, WO95/07994, US 5,091,309 and US 5,217,879 
5 are employable. Such alpha viruses may be obtained from depositories or collections such as the 
ATCC in Rockville, Maryland or isolated from known sources using commonly available 
techniques. Preferably, alphavirus vectors with reduced cytotoxicity are used (see USSN 
08/679640). 

DNA vector systems such as eukarytic layered expression systems are also useful for expressing 
10 the nucleic acids of the invention. See WO95/07994 for a detailed description of eukaryotic layered 
expression systems. Preferably, the eukaryotic layered expression systems of the invention are 
derived from alphavirus vectors and most preferably from Sindbis viral vectors. 

Other viral vectors suitable for use in the present invention include those derived from poliovirus, 
for example ATCC VR-58 and those described in Evans, Nature 339 (1989) 385 and Sabin (1973) 

15 J. Biol. Standardization 1:115; rhino virus, for example ATCC VR-1110 and those described in 
Arnold (1990) J Cell Biochem L401; pox viruses such as canary pox virus or vaccinia virus, for 
example ATCC VR-1 1 1 and ATCC VR-2010 and those described in Fisher-Hoch (1989) Proc Natl 
Acad Sci 86:317; Flexner (1989) Arm NYAcadSci 569:86, Flexner (1990) Vaccine 8:17; in US 
4,603,1 12 and US 4,769,330 and WO89/01973; SV40 virus, for example ATCC VR-305 and those 

20 described in Mulligan (1979) Nature 277:108 and Madzak (1992) J Gen Virol 73: 1533; influenza 
virus, for example ATCC VR-797 and recombinant influenza viruses made employing reverse 
genetics techniques as described in US 5,166,057 and in Enami (1990) Proc Natl Acad Sci 
87:3802-3805; Enami & Palese (1991) J Virol 65:271 1-2713 andLuytjes (1989) Cell 59:110, (see 
also McMichael (1983) NEJ Med 309:13, and Yap (1978) Nature 273:238 and Nature (1979) 

25 277: 1 08); human immunodeficiency virus as described in EP-03 86882 and in Buchschacher (1 992) 
J. Virol. 66:2731; measles virus, for example ATCC VR-67 and VR-1247 and those described in 
EP-0440219; Aura virus, for example ATCC VR-368; Bebaru virus, for example ATCC VR-600 
and ATCC VR-1240; Cabassou virus, for example ATCC VR-922; Chikungunya virus, for 
example ATCC VR-64 and ATCC VR-1241; Fort Morgan Virus, for example ATCC VR-924; 

30 Getah virus, for example ATCC VR-369 and ATCC VR-1 243; Kyzylagach virus, for example 
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ATCC VR-927; Mayaro virus, for example ATCC VR-66; Mucambo virus, for example ATCC 
VR-580 and ATCC VR-1244; Ndumu virus, for example ATCC VR-371; Pixuna virus, for 
example ATCC VR-372 and ATCC VR-1245; Tonate virus, for example ATCC VR-925; Triniti 
virus, for example ATCC VR-469; Una virus, for example ATCC VR-374; Whataroa virus, for 
5 example ATCC VR-926; Y-62-33 virus, for example ATCC VR-375; O'Nyong virus, Eastern 
encephalitis virus, for example ATCC VR-65 and ATCC VR-1242; Western encephalitis virus, for 
example ATCC VR-70, ATCC VR-1251, ATCC VR-622 and ATCC VR-1252; and coronavirus, 
for example ATCC VR-740 and those described in Hamre (1966) Proc Soc Exp Biol Med 121 : 190. 

Delivery of the compositions of this invention into cells is not hmited to the above mentioned viral 
10 vectors. Other delivery methods and media may be employed such as, for example, nucleic acid 
expression vectors, polycationic condensed DNA linked or unlinked to killed adenovirus alone, for 
example see US Serial No. 08/366,787, filed December 30, 1994 and Curiel (1992) Hum Gene Ther 
3:147-154 ligand linked DNA, for example see Wu (1989) J Biol Chem 264:16985-16987, 
eucaryotic cell delivery vehicles cells, for example see US Serial No.08/240,030, filed May 9, 
15 1994, and US Serial No. 08/404,796, deposition of photopolymerized hydrogel materials, 
hand-held gene transfer particle gun, as described in US Patent 5,149,655, ionizing radiation as 
described in US5,206,152 and in W092/1 1033, nucleic charge neutralization or fusion with cell 
membranes. Additional approaches are described in Philip (1994) Mol Cell Biol 14:2411-2418 and 
in Woffendin (1994) Proc Natl Acad Sci 91:1581-1585. 

20 Particle mediated gene transfer may be employed, for example see US Serial No. 60/023,867. 
Briefly, the sequence can be inserted into conventional vectors that contain conventional control 
sequences for high level expression, and then incubated with synthetic gene transfer molecules such 
as polymeric DNA-binding cations like polylysine, protamine, and albumin, linked to cell targeting 
ligands such as asialoorosomucoid, as described in Wu & Wu (1987) J. Biol. Chem. 

25 262:4429-4432, insulin as described in Hucked (1 990) Biochem Pharmacol 40:253-263, galactose 
as described in Plank (1992) Bioconjugate Chem 3:533-539, lactose or transferrin. 

Naked DNA may also be employed. Exemplary naked DNA introduction methods are described 
in WO 90/1 1092 and US 5,580,859. Uptake efficiency may be improved using biodegradable latex 
beads. DNA coated latex beads are efficiently transported into cells after endocytosis initiation by 
30 the beads. The method may be improved further by treatment of the beads to increase 
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hydrophobicity and thereby facilitate disruption of the endosome and release of the DNA into the 
cytoplasm. 

Liposomes that can act as gene delivery vehicles are described in US 5,422,120, W095/13796, 
W094/23697, W091/14445 and EP-524,968. As described in USSN. 60/023,867, on non-viral 
5 delivery, the nucleic acid sequences encoding a polypeptide can be inserted into conventional 
vectors that contain conventional control sequences for high level expression, and then be incubated 
with synthetic gene transfer molecules such as polymeric DNA-binding cations like polylysine, 
protamine, and albumin, linked to cell targeting ligands such as asialoorosomucoid, insulin, 
galactose, lactose, or transferrin. Other delivery systems include the use of liposomes to encapsulate 

10 DNA comprising the gene under the control of a variety of tissue-specific or ubiquitously-active 
promoters. Further non- viral delivery suitable for use includes mechanical delivery systems such 
as the approach described in Woffendin et al (1994) Proc. Natl. Acad. Sci. USA 
91(24):1 1581-1 1585. Moreover, the coding sequence and the product of expression of such can be 
delivered through deposition of photopolymerized hydrogel materials. Other conventional methods 

1 5 for gene delivery that can be used for delivery of the coding sequence include, for example, use of 
hand-held gene transfer particle gun, as described in US 5,149,655; use of ionizing radiation for 
activating transferred gene, as described in US 5,206,152 and WO92/11033 

Exemplary liposome and polycationic gene delivery vehicles are those described in US 5,422,120 
and 4,762,915; inWO 95/13796; W094/23697; and W091/14445; in EP-0524968; and in Stryer, 
20 Biochemistry, pages 236-240 (1975) W.H. Freeman, San Francisco; Szoka (1980) Biochem 
Biophys Acta 600:1; Bayer (1979) Biochem Biophys Acta 550:464; Rivnay (1987) Meth Enzymol 
149:119; Wang (1987) Proc Natl Acad Sci 84:7851; Plant (1989) Anal Biochem 176:420. 

A polynucleotide composition can comprises therapeutically effective amount of a gene therapy 
vehicle, as the term is defined above. For purposes of the present invention, an effective dose will 
25 be from about 0.0 1 mg/ kg to 50 mg/kg or 0.05 mg/kg to about 1 0 mg/kg of the DNA constructs 
in the individual to which it is administered. 



Delivery Methods 

Once formulated, the polynucleotide compositions of the invention can be administered (1) directly 
to the subject; (2) delivered ex vivo, to cells derived from the subject; or (3) in vitro for expression 
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of recombinant proteins. The subjects to be treated can be mammals or birds. Also, human subjects 
can be treated. 

Direct delivery of the compositions will generally be accomplished by injection, either 
subcutaneously, intraperitoneally, intravenously or intramuscularly or delivered to the interstitial 
5 space of a tissue. The compositions can also be administered into a lesion. Other modes of 
administration include oral and pulmonary administration, suppositories, and transdermal or 
transcutaneous applications (eg. see WO98/20734), needles, and gene guns or hyposprays. Dosage 
treatment may be a single dose schedule or a multiple dose schedule. 

Methods for the ex vivo delivery and reimplantation of transformed cells into a subject are known 
10 in the art and described in eg. W093/ 14778. Examples of cells useful in ex vivo applications 
include, for example, stem cells, particularly hematopoetic, lymph cells, macrophages, dendritic 
cells, or tumor cells. 

Generally, delivery of nucleic acids for both ex vivo and in vitro applications can be accomplished 
by the following procedures, for example, dextran-mediated transfection, calcium phosphate 
1 5 precipitation, polybrene mediated transfection, protoplast fusion, electroporation, encapsulation of 
the polynucleotide(s) in liposomes, and direct microinjection of the DNA into nuclei, all well 
known in the art. 

Polynucleotide and polypeptide pharmaceutical compositions 

In addition to the pharmaceutical^ acceptable carriers and salts described above, the following 
20 additional agents can be used with polynucleotide and/or polypeptide compositions. 
A.Polypeptides 

One example are polypeptides which include, without limitation: asioloorosomucoid (ASOR); 
transferrin; asialoglycoproteins; antibodies; antibody fragments; ferritin; interleukins; interferons, 
granulocyte, macrophage colony stimulating factor (GM-CSF), granulocyte colony stimulating 
25 factor (G-CSF), macrophage colony stimulating factor (M-CSF), stem cell factor and 
erythropoietin. Viral antigens, such as envelope proteins, can also be used. Also, proteins from 
other invasive organisms, such as the 17 amino acid peptide from the circumsporozoite protein of 
Plasmodium falciparum known as RII. 
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B. Hormones, Vitamins, etc. 

Other groups that can be included are, for example: hormones, steroids, androgens, estrogens, 
thyroid hormone, or vitamins, folic acid. 

C. Poly alky lenes, Polysaccharides, etc. 

5 Also, polyalkylene glycol can be included with the desired polynucleotides/polypeptides. In a 
preferred embodiment, the polyalkylene glycol is polyethlylene glycol. In addition, mono-, di-, or 
polysaccarides can be included. In a preferred embodiment of this aspect, the polysaccharide is 
dextran or DEAE-dextran. Also, chitosan and poly(lactide-co-glycolide) 

D. Lipids, and Liposomes 

1 0 The desired polynucleotide/polypeptide can also be encapsulated in lipids or packaged in liposomes 
prior to delivery to the subject or to cells derived therefrom. 

Lipid encapsulation is generally accomplished using liposomes which are able to stably bind or 
entrap and retain nucleic acid. The ratio of condensed polynucleotide to lipid preparation can vary 
but will generally be around 1 :1 (mg DNA:micromoles lipid), or more of lipid. For a review of the 
15 use of liposomes as carriers for delivery of nucleic acids, see, Hug and Sleight (1991) Biochim. 
Biophys. Acta. 1097:1-17; Straubinger (1983) Meth. Enzymol. 101:512-527. 

Liposomal preparations for use in the present invention include cationic (positively charged), 
anionic (negatively charged) and neutral preparations. Cationic liposomes have been shown to 
mediate intracellular delivery of plasmid DNA (Feigner (1987) Proc. Natl. Acad. Sci. USA 
20 84:7413-7416); mRNA (Malone (1989) Proc. Natl. Acad. Sci. USA 86:6077-6081); and purified 
transcription factors (Debs (1990) J. Biol. Chem. 265:10189-10192), in functional form. 

Cationic liposomes are readily available. For example, N[l-2,3-dioleyloxy)propyl]-N,N,N-trie%lammonium 
(DOTMA) liposomes are available under the trademark Lipofectin, from GIBCO BRL, Grand 
Island, NY. (See, also, Feigner supra). Other commercially available liposomes include 
25 transfectace (DDAB/DOPE) and DOTAP/DOPE (Boerhinger). Other cationic liposomes can be 
prepared from readily available materials using techniques well known in the art. See, eg. Szoka 
(1978) Proc. Natl. Acad. Sci. USA 75:4194-4198; WO90/1 1092 for a description of the synthesis 
of DOTAP (l,2-bis(oleoyloxy)-3-(trimethylammonio)propane) liposomes. 
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Similarly, anionic and neutral liposomes are readily available, such as from Avanti Polar Lipids 
(Birmingham, AL), or can be easily prepared using readily available materials. Such materials 
include phosphatidyl choline, cholesterol, phosphatidyl ethanolamine, dioleoylphosphatidyl choline 
(DOPC), dioleoylphosphatidyl glycerol (DOPG), dioleoylphoshatidyl ethanolamine (DOPE), 
5 among others. These materials can also be mixed with the DOTMA and DOTAP starting materials 
in appropriate ratios. Methods for making liposomes using these materials are well known in the 
art. 

The liposomes can comprise multilammelar vesicles (MLVs), small unilamellar vesicles (SUVs), 
or large unilamellar vesicles (LUVs). The various liposome-nucleic acid complexes are prepared 

10 using methods known in the art. See eg. Straubinger (1983) Meth. Immunol. 101 :512-527; Szoka 
(1978) Proc. Natl. Acad. Sci. USA 75:4194-4198; Papahadjopoulos (1975) Biochim. Biophys. Acta 
394:483; Wilson (1979) Cell 17:77); Deamer & Bangham (1976) Biochim. Biophys. Acta 443:61% 
Ostro (1977) Biochem. Biophys. Res. Commun. 76:836; Fraley (1979) Proc. Natl. Acad. Sci. USA 
76:3348); Enoch & Strittmatter (1979) Proc. Natl. Acad. Sci. USA 76:145; Fraley (1980) /. Biol. 

15 Chem. (1980) 255:10431; Szoka & Papahadjopoulos (1978) Proc. Natl. Acad. Sci. USA 75:145; 
and Schaefer-Ridder (1982) Science 215:166. 
E.Lipoproteins 

In addition, lipoproteins can be included with the polynucleotide/polypeptide to be delivered. 
Examples of lipoproteins to be utilized include: chylomicrons, HDL, IDL, LDL, and VLDL. 
20 Mutants, fragments, or fusions of these proteins can also be used. Also, modifications of naturally 
occurring lipoproteins can be used, such as acetylated LDL. These lipoproteins can target the 
delivery of polynucleotides to cells expressing lipoprotein receptors. Preferably, if lipoproteins are 
including with the polynucleotide to be delivered, no other targeting ligand is included in the 
composition. 

25 Naturally occurring lipoproteins comprise a lipid and a protein portion. The protein portion are 
known as apoproteins. At the present, apoproteins A, B, C, D, and E have been isolated and 
identified. At least two of these contain several proteins, designated by Roman numerals, AI, All, 
AIV; CI, CII, Cin. 

A lipoprotein can comprise more than one apoprotein. For example, naturally occurring 
30 chylomicrons comprises of A, B, C, and E, over time these lipoproteins lose A and acquire C and 
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E apoproteins. VLDL comprises A, B, C, and E apoproteins, LDL comprises apoprotein B; and 
HDL comprises apoproteins A, C, and E. 

The amino acid of these apoproteins are known and are described in, for example, Breslow (1985) 
AnnuRev. Biochem 54:699; Law (1986) Adv. Exp Med. Biol. 151:162; Chen (1986) J Biol Chem 
5 261:12918; Kane (1980) Proc Natl Acad Sci USA 77:2465; and Utermann (1984) Hum Genet 
65:232. 

Lipoproteins contain a variety of lipids including, triglycerides, cholesterol (free and esters), and 
phopholipids. The composition of the lipids varies in naturally occurring lipoproteins. For example, 
chylomicrons comprise mainly triglycerides. A more detailed description of the lipid content of 
10 naturally occurring lipoproteins can be found, for example, in Meth. Enzymol. 128 (1986). The 
composition of the lipids are chosen to aid in conformation of the apoprotein for receptor binding 
activity. The composition of lipids can also be chosen to facilitate hydrophobic interaction and 
association with the polynucleotide binding molecule. 

Naturally occurring lipoproteins can be isolated from serum by ultracentrifugation, for instance. 

15 Such methods are described in Meth. Enzymol. {supra); Pitas (1980) J. Biochem. 255:5454-5460 
and Mahey (1979) / Clin. Invest 64:743-750. Lipoproteins can also be produced by in vitro or 
recombinant methods by expression of the apoprotein genes in a desired host cell. See, for example, 
Atkinson (1986) Annu Rev Biophys Chem 15:403 and Radding (1958) Biochim Biophys Acta 30: 
443. Lipoproteins can also be purchased from commercial suppliers, such as Biomedical 

20 Techniologies, Inc., Stoughton, Massachusetts, USA. Further description of lipoproteins can be 
found in Zuckermann et al. PCT/US97/14465. 
F.Polycationic Agents 

Polycationic agents can be included, with or without lipoprotein, in a composition with the desired 
polynucleotide/polypeptide to be delivered. 

25 Polycationic agents, typically, exhibit a net positive charge at physiological relevant pH and are 
capable of neutralizing the electrical charge of nucleic acids to facilitate delivery to a desired 
location. These agents have both in vitro, ex vivo, and in vivo applications. Polycationic agents can 
be used to deliver nucleic acids to a living subject either intramuscularly, subcutaneously, etc. 
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The following are examples of useful polypeptides as polycationic agents: polylysine, polyarginine, 
polyornithine, and protamine. Other examples include histones, protamines, human serum albumin, 
DNA binding proteins, non-histone chromosomal proteins, coat proteins from DNA viruses, such 
as (XI 74, transcriptional factors also contain domains that bind DNA and therefore may be useful 
5 as nucleic aid condensing agents. Briefly, transcriptional factors such as C/CEBP, c-jun, c-fos, 
AP-1, AP-2, AP-3, CPF, Prot-1, Sp-1, Oct-1, Oct-2, CREP, and TFIID contain basic domains that 
bind DNA sequences. 

Organic polycationic agents include: spermine, spermidine, and purtrescine. 

The dimensions and of the physical properties of a polycationic agent can be extrapolated from the 
10 list above, to construct other polypeptide polycationic agents or to produce synthetic polycationic 
agents. 

Synthetic polycationic agents which are useful include, for example, DEAE-dextran, polybrene. 
Lipofectin™, and lipofectAMINE™ are monomers that form polycationic complexes when 
combined with polynucleotides/polypeptides. 

15 Immunodiagnostic Assays 

Neisserial antigens of the invention can be used in immunoassays to detect antibody levels (or, 
conversely, anti-Neisserial antibodies can be used to detect antigen levels). Immunoassays based 
on well defined, recombinant antigens can be developed to replace invasive diagnostics methods. 
Antibodies to Neisserial proteins within biological samples, including for example, blood or serum 

20 samples, can be detected. Design of the immunoassays is subject to a great deal of variation, and 
a variety of these are known in the art. Protocols for the immunoassay may be based, for example, 
upon competition, or direct reaction, or sandwich type assays. Protocols may also, for example, use 
solid supports, or may be by immunoprecipitation. Most assays involve the use of labeled antibody 
or polypeptide; the labels may be, for example, fluorescent, chemiluminescent, radioactive, or dye 

25 molecules. Assays which amplify the signals from the probe are also known; examples of which 
are assays which utilize biotin and avidin, and enzyme-labeled and mediated immunoassays, such 
as ELISA assays. 

Kits suitable for immunodiagnosis and containing the appropriate labeled reagents are constructed 
by packaging the appropriate materials, including the compositions of the invention, in suitable 



CHIR-0160 (356.001) PAliiM 1 

-45- 

containers, along with the remaining reagents and materials (for example, suitable buffers, salt 
solutions, etc.) required for the conduct of the assay, as well as suitable set of assay instructions. 

Nucleic Acid Hybridisation 

"Hybridization" refers to the association of two nucleic acid sequences to one another by hydrogen 
5 bonding. Typically, one sequence will be fixed to a solid support and the other will be free in 
solution. Then, the two sequences will be placed in contact with one another under conditions that 
favor hydrogen bonding. Factors that affect this bonding include: the type and volume of solvent; 
reaction temperature; time of hybridization; agitation; agents to block the non-specific attachment 
of the liquid phase sequence to the solid support (Denhardt's reagent or BLOTTO); concentration 
10 of the sequences; use of compounds to increase the rate of association of sequences (dextran sulfate 
or polyethylene glycol); and the stringency of the washing conditions following hybridization. See 
Sambrook et al. [supra] Volume 2, chapter 9, pages 9.47 to 9.57. 

"Stringency" refers to conditions in a hybridization reaction that favor association of very similar 
sequences over sequences that differ. For example, the combination of temperature and salt 
1 5 concentration should be chosen that is approximately 120 to 200°C below the calculated Tm of the 
hybrid under study. The temperature and salt conditions can often be determined empirically in 
preliminary experiments in which samples of genomic DNA immobilized on filters are hybridized 
to the sequence of interest and then washed under conditions of different stringencies. See 
Sambrook et al. at page 9.50. 

20 Variables to consider when performing, for example, a Southern blot are (1) the complexity of the 
DNA being blotted and (2) the homology between the probe and the sequences being detected. The 
total amount of the fragment(s) to be studied can vary a magnitude of 10, from 0.1 to lug for a 
plasmid or phage digest to 10" 9 to 10" 8 g for a single copy gene in a highly complex eukaryotic 
genome. For lower complexity polynucleotides, substantially shorter blotting, hybridization, and 

25 exposure times, a smaller amount of starting polynucleotides, and lower specific activity of probes 
can be used. For example, a single-copy yeast gene can be detected with an exposure time of only 
1 hour starting with 1 u.g of yeast DNA, blotting for two hours, and hybridizing for 4-8 hours with 
a probe of 10 8 cpm/ug. For a single-copy mammalian gene a conservative approach would start 
with 10 jig of DNA, blot overnight, and hybridize overnight in the presence of 10% dextran sulfate 

30 using a probe of greater than 10 8 cpm/ug, resulting in an exposure time of -24 hours. 
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Several factors can affect the melting temperature (Tm) of a DNA-DNA hybrid between the probe 
and the fragment of interest, and consequently, the appropriate conditions for hybridization and 
washing. In many cases the probe is not 100% homologous to the fragment. Other commonly 
encountered variables include the length and total G+C content of the hybridizing sequences and 
5 the ionic strength and formamide content of the hybridization buffer. The effects of all of these 
factors can be approximated by a single equation: 

Tm= 81 + 16.6(log, 0 Ci) + 0.4[%(G + C)]-0.6(%formamide) - 600/n-1.5(%mismatch). 

where Ci is the salt concentration (monovalent ions) and n is the length of the hybrid in base pairs 
(slightly modified from Meinkoth & Wahl (1984) Anal. Biochem. 138: 267-284). 

10 In designing a hybridization experiment, some factors affecting nucleic acid hybridization can be 
conveniently altered. The temperature of the hybridization and washes and the salt concentration 
during the washes are the simplest to adjust. As the temperature of the hybridization increases (ie. 
stringency), it becomes less likely for hybridization to occur between strands that are 
nonhomologous, and as a result, background decreases. If the radiolabeled probe is not completely 

15 homologous with the immobilized fragment (as is frequently the case in gene family and 
interspecies hybridization experiments), the hybridization temperature must be reduced, and 
background will increase. The temperature of the washes affects the intensity of the hybridizing 
band and the degree of background in a similar manner. The stringency of the washes is also 
increased with decreasing salt concentrations. 

20 In general, convenient hybridization temperatures in the presence of 50% formamide are 42°C for 
a probe with is 95% to 100% homologous to the target fragment, 37°C for 90% to 95% homology, 
and 32°C for 85% to 90% homology. For lower homologies, formamide content should be lowered 
and temperature adjusted accordingly, using the equation above. If the homology between the probe 
and the target fragment are not known, the simplest approach is to start with both hybridization and 

25 wash conditions which are nonstringent. If non-specific bands or high background are observed 
after autoradiography, the filter can be washed at high stringency and reexposed. If the time 
required for exposure makes this approach impractical, several hybridization and/or washing 
stringencies should be tested in parallel. 
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Nucleic Acid Probe Assays 

Methods such as PCR, branched DNA probe assays, or blotting techniques utilizing nucleic acid 
probes according to the invention can determine the presence of cDNA or mRNA. A probe is said 
to "hybridize" with a sequence of the invention if it can form a duplex or double stranded complex, 
5 which is stable enough to be detected. 

The nucleic acid probes will hybridize to the Neisserial nucleotide sequences of the invention 
(including both sense and antisense strands). Though many different nucleotide sequences will 
encode the amino acid sequence, the native Neisserial sequence is preferred because it is the actual 
sequence present in cells. mRNA represents a coding sequence and so a probe should be 
1 0 complementary to the coding sequence; single-stranded cDNA is complementary to mRNA, and 
so a cDNA probe should be complementary to the non-coding sequence. 

The probe sequence need not be identical to the Neisserial sequence (or its complement) — some 
variation in the sequence and length can lead to increased assay sensitivity if the nucleic acid probe 
can form a duplex with target nucleotides, which can be detected. Also, the nucleic acid probe can 

1 5 include additional nucleotides to stabilize the formed duplex. Additional Neisserial sequence may 
also be helpful as a label to detect the formed duplex. For example, a non-complementary 
nucleotide sequence may be attached to the 5' end of the probe, with the remainder of the probe 
sequence being complementary to a Neisserial sequence. Alternatively, non-complementary bases 
or longer sequences can be interspersed into the probe, provided that the probe sequence has 

20 sufficient complementarity with the a Neisserial sequence in order to hybridize therewith and 
thereby form a duplex which can be detected. 

The exact length and sequence of the probe will depend on the hybridization conditions, such as 
temperature, salt condition and the like. For example, for diagnostic applications, depending on the 
complexity of the analyte sequence, the nucleic acid probe typically contains at least 10-20 
25 nucleotides, preferably 15-25, and more preferably at least 30 nucleotides, although it may be 
shorter than this. Short primers generally require cooler temperatures to form sufficiently stable 
hybrid complexes with the template. 
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Probes may be produced by synthetic procedures, such as the triester method of Matteucci et al. 
[J. Am. Chem. Soc. (1981) 103:3185], or according to Urdea et al. [Proc. Natl. Acad. Sci. USA 
(1983) 80: 7461], or using commercially available automated oligonucleotide synthesizers. 

The chemical nature of the probe can be selected according to preference. For certain applications, 
5 DNA or RNA are appropriate. For other applications, modifications may be incorporated eg. 
backbone modifications, such as phosphorothioates or methylphosphonates, can be used to increase 
in vivo half-life, alter RNA affinity, increase nuclease resistance etc. [eg. see Agrawal & Iyer 
(1995) Curr Opin Biotechnol 6:12-19; Agrawal (1996) TIBTECH 14:376-387]; analogues such as 
peptide nucleic acids may also be used [eg. see Corey (1997) TIBTECH 15:224-229; Buchardt et 
10 al. (1993) TIBTECH \ 1:384-386]. 

Alternatively, the polymerase chain reaction (PCR) is another well-known means for detecting 
small amounts of target nucleic acids. The assay is described in: Mullis et al. [Meth. Enzymol. 
(1987) 155: 335-350]; US patents 4,683,195 and 4,683,202. Two "primer" nucleotides hybridize 
with the target nucleic acids and are used to prime the reaction. The primers can comprise sequence 
1 5 that does not hybridize to the sequence of the amplification target (or its complement) to aid with 
duplex stability or, for example, to incorporate a convenient restriction site. Typically, such 
sequence will flank the desired Neisserial sequence. 

A thermostable polymerase creates copies of target nucleic acids from the primers using the 
original target nucleic acids as a template. After a threshold amount of target nucleic acids are 
20 generated by the polymerase, they can be detected by more traditional methods, such as Southern 
blots. When using the Southern blot method, the labelled probe will hybridize to the Neisserial 
sequence (or its complement). 

Also, mRNA or cDNA can be detected by traditional blotting techniques described in Sambrook 
et al [supra]. mRNA, or cDNA generated from mRNA using a polymerase enzyme, can be purified 
25 and separated using gel electrophoresis. The nucleic acids on the gel are then blotted onto a solid 
support, such as nitrocellulose. The solid support is exposed to a labelled probe and then washed 
to remove any unhybridized probe. Next, the duplexes containing the labeled probe are detected. 
Typically, the probe is labelled with a radioactive moiety. 
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BRIEF DESCRIPTION OF THE DRAWINGS 

Figures 1-20 show biochemical data obtained in the Examples, and also sequence analysis, for 
ORFs 37, 5, 2, 15, 22, 28, 32, 4, 61, 76, 89, 97, 106, 138, 23, 25, 27, 79, 85 and 132. Ml and M2 
are molecular weight markers. Arrows indicate the position of the main recombinant product or, 

5 in Western blots, the position of the main N. meningitidis immunoreactive band. TP indicates 
N. meningitidis total protein extract; OMV indicates N. meningitidis outer membrane vesicle 
preparation. In bactericidal assay results: a diamond (♦) shows preimmune data; a triangle (A) 
shows GST control data; a circle (•) shows data with recombinant N. meningitidis protein. 
Computer analyses show a hydrophilicity plot (upper), an antigenic index plot (middle), and an 

10 AMPHI analysis (lower). The AMPHI program has been used to predict T-cell epitopes [Gao et 
al. (1989) J. Immunol. 143:3007; Roberts et al. (1996) AIDS Res Hum Retrovir 12:593; Quakyi et 
al. (1 992) Scand J Immunol suppl. 1 1 :9) and is available in the Protean package of DNASTAR, Inc. 
(1228 South Park Street, Madison, Wisconsin 53715 USA). 

Figure 21 shows an alignment comparison of amino acid sequences for ORF 4 for several strains 
15 of Neisseria. Dark shading indicates regions of homology, and gray shading indicates the 
conservation of amino acids with similar characteristics. The Figure demonstrates a high degree 
of conservation among the various strains, further confirming its utility as an antigen for both 
vaccines and diagnostics. 



20 EXAMPLES 

The examples describe nucleic acid sequences which have been identified in N. meningitidis, along 
with their putative translation products, and also those of TV", gonorrhoeae. Not all of the nucleic acid 
sequences are complete ie. they encode less than the full-length wild-type protein. 

The examples are generally in the following format: 
25 • a nucleotide sequence which has been identified in N. meningitidis (strain B) 

• the putative translation product of this sequence 

• a computer analysis of the translation product based on database comparisons 

• corresponding gene and protein sequences identified in N. meningitidis (strain A) and in 
N. gonorrhoeae 
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• a description of the characteristics of the proteins which indicates that they might be 
suitably antigenic 

• results of biochemical analysis (expression, purification, ELISA, FACS etc.) 

The examples typically include details of sequence identity between species and strains. Proteins 
5 that are similar in sequence are generally similar in both structure and function, and the sequence 
identity often indicates a common evolutionary origin. Comparison with sequences of proteins of 
known function is widely used as a guide for the assignment of putative protein function to a new 
sequence and has proved particularly useful in whole-genome analyses. 

Sequence comparisons were performed at NCBI (http://www.ncbi.nlm.nih.gov) using the 
1 0 algorithms BLAST, BLAST2, BLASTn, BLASTp, tBLASTn, BLASTx, & tBLASTx [eg. see also 
Altschul et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database 
search programs. Nucleic Acids Research 25:2289-3402]. Searches were performed against the 
following databases: non-redundant GenBank+EMBL+DDBJ+PDB sequences and non-redundant 
GenBank CDS translations+PDB+SwissProt+SPupdate+PIR sequences. 

15 To compare Meningococcal and Gonococcal sequences, the tBLASTx algorithm was used, as 
implemented at http://www.genome.ou.edu/gono_blast.html. The FAST A algorithm was also used 
to compare the ORFs (from GCG Wisconsin Package, version 9.0). 

Dots within nucleotide sequences (eg. position 495 in SEQ ID 1 1) represent nucleotides which have 
been arbitrarily introduced in order to maintain a reading frame. In the same way, double- 
20 underlined nucleotides were removed. Lower case letters (eg. position 496 in SEQ ID 1 1) represent 
ambiguities which arose during alignment of independent sequencing reactions (some of the 
nucleotide sequences in the examples are derived from combining the results of two or more 
experiments). 

Nucleotide sequences were scanned in all six reading frames to predict the presence of hydrophobic 
25 domains using an algorithm based on the statistical studies of Esposti et al. [Critical evaluation of 
the hydropathy of membrane proteins (1990) Eur J Biochem 190:207-219], These domains 
represent potential transmembrane regions or hydrophobic leader sequences. 
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Open reading frames were predicted from fragmented nucleotide sequences using the program 
ORFFINDER (NCBI). 

Underlined amino acid sequences indicate possible transmembrane domains or leader sequences 
in the ORFs, as predicted by the PSORT algorithm (http://www.psort.nibb.ac.jp). Functional 
5 domains were also predicted using the MOTIFS program (GCG Wisconsin & PROSITE). 

Various tests can be used to assess the in vivo immunogencity of the proteins identified in the 
examples. For example, the proteins can be expressed recombinantly and used to screen patient sera 
by immunoblot. A positive reaction between the protein and patient serum indicates that the patient 
has previously mounted an immune response to the protein in question ie. the protein is an 
10 immunogen. This method can also be used to identify immunodominant proteins. 

The recombinant protein can also be conveniently used to prepare antibodies eg. in a mouse. These 
can be used for direct confirmation that a protein is located on the cell-surface. Labelled antibody 
{eg. fluorescent labelling for FACS) can be incubated with intact bacteria and the presence of label 
on the bacterial surface confirms the location of the protein. 

15 In particular, the following methods (A) to (S) were used to express, purify and biochemically 
characterise the proteins of the invention: 

A) Chromosomal DNA preparation 

N. meningitidis strain 2996 was grown to exponential phase in 100ml of GC medium, harvested by 
centrifugation, and resuspended in 5ml buffer (20% Sucrose, 50mM Tris-HCl, 50mM EDTA, pH8). 

20 After 10 minutes incubation on ice, the bacteria were lysed by adding 10ml lysis solution (50mM 
NaCl, 1% Na-Sarkosyl, 50u.g/ml Proteinase K), and the suspension was incubated at 37°C for 2 
hours. Two phenol extractions (equilibrated to pH 8) and one ChCl 3 /isoamylalcohol (24:1) 
extraction were performed. DNA was precipitated by addition of 0.3M sodium acetate and 2 
volumes ethanol, and was collected by centrifugation. The pellet was washed once with 70% 

25 ethanol and redissolved in 4ml buffer (lOmM Tris-HCl, ImM EDTA, pH 8). The DNA 
concentration was measured by reading the OD at 260 nm. 
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B) Oligonucleotide design 

Synthetic oligonucleotide primers were designed on the basis of the coding sequence of each ORF, 
using (a) the meningococcus B sequence when available, or (b) the gonococcus/meningococcus A 
sequence, adapted to the codon preference usage of meningococcus as necessary. Any predicted 
5 signal peptides were omitted, by deducing the 5 '-end amplification primer sequence immediately 
downstream from the predicted leader sequence. 

For most ORFs, the 5' primers included two restriction enzyme recognition sites (BamHl-Ndel, 
BamHl-Nhel, or EcoRl-Nhel, depending on the gene's own restriction pattern); the 3' primers 
included a Xhol restriction site. This procedure was established in order to direct the cloning of 
10 each amplification product (corresponding to each ORF) into two different expression systems: 
pGEX-KG (using either BamHl-XhoI or EcoRl-XhoX), and pET21b+ (using either Ndel-Xhol or 
Nhel-Xhol). 

5 '-end primer tail: CGC GGATCCCATATG (BamHl-Ndel ) 

CGC GGATCCGCTAGC {BamHl-Nhel) 
15 CCG GAATTC TA GCTAGC (EcoRl-Nhel) 

3 '-end primer tail: CCCG CTCGAG (Xhol) 

For ORFs 5, 15, 17, 19, 20, 22, 27, 28, 65 & 89, two different amplifications were performed to 
clone each ORF in the two expression systems. Two different 5' primers were used for each ORF; 
the same 3' Xhol primer was used as before: 

20 5'-end primer tail: GGAATTC CATATG GCCATGG (Ndel) 

5 ' -end primer tail: C G GGATCC (BamHl) 

ORF 76 was cloned in the pTRC expression vector and expressed as an amino -terminus His-tag 
fusion, hi this particular case, the predicted signal peptide was included in the final product. Nhel- 
BamHl restriction sites were incorporated using primers: 

25 5'-end primer tail: GAT C AGC T AGC CAT AT G (Mel) 

3 '-end primer tail: CG GGATCC (BamHl) 
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As well as containing the restriction enzyme recognition sequences, the primers included 
nucleotides which hybridizeed to the sequence to be amplified. The number of hybridizing 
nucleotides depended on the melting temperature of the whole primer, and was determined for each 
primer using the formulae: 
5 T m = 4 (G+C)+ 2 (A+T) (tail excluded) 

T m = 64.9 + 0.4 1 (% GC) - 600/N (whole primer) 

The average melting temperature of the selected oligos were 65-70°C for the whole oligo and 
50-55°C for the hybridising region alone. 

Table I shows the forward and reverse primers used for each amplification. In certain cases, it will 
10 be noted that the sequence of the primer does not exactly match the sequence in the ORF. When 
initial amplifications were performed, the complete 5' and/or 3' sequence was not known for some 
meningococcal ORFs, although the corresponding sequences had been identified in gonococcus. 
For amplification, the gonococcal sequences could thus be used as the basis for primer design, 
altered to take account of codon preference. In particular, the following codons were changed: 
1 5 ATA-^ATT; TCG->TCT; CAG->CAA; AAG-^AAA; GAG-^GAA; CGA-^CGC; CGG->CGC; 
GGG^GGC. Italicised nucleotides in Table I indicate such a change. It will be appreciated that, 
once the complete sequence has been identified, this approach is generally no longer necessary. 

TABLE I - PCR primers 



ORF 


Primer 


Sequence 


Restriction sites 


ORF 1 


Forward 
Reverse 


CGCGGATCCGCTAGC-GGACACACTTATTTCGG <SEO ID 
924> 

CCCGCTCGAG-CCAGCGGTAGCCTAATT <SEQ ID 925> 


BamHI-Nhel 
Xhol 


ORF 2 


Forward 
Reverse 


GCGGATCCCATATG-TTTGATTTCGGTTTGGG <SEO ID 92 6> 
CCCGCTCGAG-GACGGCATAACGGCG <SEQ ID 927> 


BamHI-Ndel 
Xhol 


ORF 2-1 


Forward 
Reverse 


GCGGATCCCATATG-TTTGATTTCGGTTTGGG <SEO ID 
928> 

CCCGCTCGAG-TGATTTACGGACGCGCA <SEQ ID 92 9> 


BamHI-Ndel 
Xhol 


ORF 4 


Forward 
Reverse 


GCGGATCCCATATG-TGCGGAGGTCAAAAAGAC <SEO TD 
930> 

CCCGCTCGAG-TTTGGCTGCGCCTTC <SEO ID 931> 


BamHI-Ndel 
Xhol 
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ORF5 


Forward 


GGAATTCCATATGGCCATGG-TGGAAGGCGCACAACC <SEQ ID 
932> 

CGGGATCC-ATGGAAGGCGCACAAC <SEQ ID 933> 
CCCGCTCGAG-GACTGTGCAAAAACGG <SEQ ID 934> 


Ndel-Ncol 




Forward 
Reverse 


BamHI 
Xhol 


ORF6 


Forward 


CGCGGATCCCATATG-ACCCGTCAATCTCTGCA <SEQ ID 
935> 

CCCGCTCGAG-TGCGCCGAACACTTTC <SEQ ID 936> 


BamHI-Ndel 




Reverse 


Xhol 


ORF7 


Forward 
Reverse 


CGCGGATCCGCTAGC-GCGCTGCTTTTTGTTCC <SEQ ID 
937> 

CCCGCTCGAG-TTTCAAAATATATTTGCGGA <SEQ ID 93 8 > 


BamHI-Nhel 
Xhol 


ORF8 


Forward 


GCGGATCCCATATG-GCTCAACTGCTTCGTAC <SEQ ID 93 9> 


BamHI-Ndel 


Reverse 


CCCGCTCGAG-AGCAGGCTTTGGCGC <SEQ ID 940> 


Xhol 


ORF9 


Forward 


CGCGGATCCCATATG-CCGAAGGAAGTCGGAAA <SEQ ID 
941> 

CCCGCTCGAG-TTTCCGAGGTTTTCGGG <SEQ ID 94 2> 


BamHI-Ndel 




Reverse 


Xhol 


ORF10 


Forward 
Reverse 


G C GG AT C C CAT AT G - GAC AC AAAAG AAAT CC T C <SEQ ID 
943> 

CCCGCTCGAG- TAATGGGAAACCTTGTTTT <SEQ ID 944> 


BamHI-Ndel 
Xhol 


ORF 11 


Forward 


GCGGATCCCATATG-GCGGTCAACCTCTACG <SEQ ID 945> 


BamHI-Ndel 


Reverse 


CCCGCTCGAG-GGAAACGACTTCGCC <SEQ ID 94 6> 


Xhol 


ORF 13 


Forward 


CGCGGATCCCATATG-GCTCTGCTTTCCGCGC <SEQ ID 94 7> 


J3cui.Ln.i-rN Lici 




Reverse 


CCCGCTCGAG-AGGGTGTGTGATAATAAG <SEQ ID 94 8> 


Xhol 


ORF 15 


Forward 


GGAATTCCATATGGCCATGG-GCGGGACACTGACAG <SEQ ID 
949> 

CGGGATCC-TGCGGGACACTGACAGG <SEQ ID 95 0> 


iNaei-iNCOi 




Forward 


rsamrii 




Reverse 


CCCGCTCGAG— AGGTTGGCCTTGTCTATG <SEQ ID 95 1> 


Xhol 


ORF 17 


Forward 
Forward 


GGAATTCCATATGGCCATGG -TTGCCGGCCTGTTCG <SEQ ID 
CGGGATCC-ATTGCCGGCCTGTTCG <SEQ ID 953> 


Ndel-Ncol 
BamHI 




Reverse 


CCCGCTCGAG-AAGCAGGTTGTACAGC <SEQ ID 954> 


Xhol 


ORF 18 


Forward 
Reverse 


GCGGATCCCATATG-ATTTTGCTGCATTTGGAT <SEQ ID 
955> 

CCCGCTCGAG-TCTTCCAATTTCTGAAAGC <SEQ ID 95 6> 


BamHI-Ndel 
Xhol 


ORF 19 


Forward 
Forward 


GGAATTCCATATGGCCATGG -TCGCCAGTGTTTTTACC <SEQ 
ID 957> 

CGGGATCC-TTCGCCAGTGTTTTTACCG <SEQ ID 958> 


Ndel-Ncol 
BamHI 




Reverse 


CCCGCTCGAG-GGTGTTTTTGAAGCTGCC <SEQ ID 959> 


Xhol 


ORF 20 


Forward 


GGAATTCCATATGGCCATGG -TCGGCGCGGGTATG <SEQ ID 
960> 


Ndel-Ncol 
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Forward 
Reverse 


CGGGATCC-TTCGGCGCGGGTATG <SEQ ID 96l> 
CCCGCTCGAG-CGGCGAGCGAGAGCA <SEQ ID 962> 


BamHI 
Xhol 


ORF 22 


Forward 
Forward 
Reverse 


GGAATTCCATATGGCCATGG-TGATTAAAATCAAAAAAGGTCT 
<SEQ ID 963> 

CGGGATCC-ATGATTAAAATCAAAAAAGGTCTAAACC <SEQ ID 
964> 

CCCGCTCGAG-ATTATGATAGCGGCCC <SEQ ID 9 65> 


Ndel-Ncol 

BamHI 

Xhol 


ORF 23 


Forward 


CGCGGATCCCATATG-GATGTTTCTGTTTCAGAC <SEQ ID 
966> 

CCCGCTCGAG-TTTAAACCGATAGGTAAACG <SEQ ID 961 > 


BamHI-Ndel 




Reverse 


Xhol 


ORF 24 


Forward 

Forward 
Reverse 


GGAATTCCATATGGCCATGG -TGATGCCGGAAATGGTG <SEQ 
ID 968> 

CGGGATCC-ATGATGCCGGAAATGGTG <SEQ ID 9 69> 
CCCGCTCGAG-TGTCAGCGTGGCGCA <SEQ ID 97 0> 


JNaei-JNcoi 

BamHI 
Xhol 


ORF 25 


Forward 
Reverse 


GCGGATCCCATATG-TATCGCAAACTGATTGC <SEQ ID 97 1> 
CCCGCTCGAG-ATCGATGGAATAGCCG <SEQ ID 97 2> 


BamHI-Ndel 
Xhol 


ORF 26 


Forward 
Reverse 


GCGGATCCCATATG - C AG C T GAT CG AC T AT T C <SEQ ID 
973> 

CCCGCTCGAG-GACATCGGCGCGTTTT <SEQ ID 974> 


BamHI-Ndel 
Xhol 


ORF 27 


Forward 
Forward 


GGAATTCCATATGGCCATGG-AGACCTATTCTGTTTA <SEQ ID 
974> 

CGGGATCC- CAGACCTATTCTGTTTATTTTAATC <SEQ ID 
975> 

CCCGCTCGAG-GGGTTCGATTAAATAACCAT <SEQ ID 97 6> 


Ndel-Ncol 
BamHI 




Reverse 


Xhol 


UKr 2s 


Forward 

Forward 
Reverse 


GGAATTCCATATGGCCATGG-ACGGCTGTACGTTGATGT <SEQ 
ID 977> 

CGGGATCC-AACGGCTGTACGTTGATG <SEQ ID 97 8> 
CCCGCTCGAG-TTTGTCAGAGGAATTCGCG <SEQ ID 97 9> 


IN UCl-lN L-UX 

BamHI 
Xhol 


ORF 29 


Forward 


GCGGATCCCATATG -AACGGTTTGGATGCCCG <SEQ ID 
980> 

CGCGGATCCGCTAGC -AACGGTTTGGATGCCCG <SEQ ID 
981> 

CCCGCTCGAG-TTTGTCTAAGTTCCTGATATG <SEQ ID 982> 


BamHI-Ndel 




Forward 


BamHI-Nhel 




Reverse 


Xhol 


ORF 32 


Forward 
Reverse 


CGCGGATCCCATATG-AATACTCCTCCTTTTG <SEQ ID 98 3> 
CCCGCTCGAG-GCGTATTTTTTGATGCTTTG <SEQ ID 98 4 > 


BamHI-Ndel 
Xhol 


ORF 33 


Forward 
Reverse 


GCGGATCCCATATG -ATTGATAGGGATCGTATG <SEQ ID 
985> 

CCCGCTCGAG-TTGATCTTTCAAACGGCC <SEQ ID 98 6> 


BamHI-Ndel 
Xhol 


ORF 35 


Forward 
Forward 
Reverse 


GCGGATCCCATATG- TTCAGAGCTCAGCTT <SEQ ID 987> 
CGCGGATCCGCTAGC-TTCAGAGCTCAGCTT <SEQ ID 98 8 > 
CCCGCTCGAG-AAACAGCCATTTGAGCGA <SEQ ID 98 9> 


BamHI-Ndel 
BamHI-Nhel 
Xhol 
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ORF 37 


Forward 
Reverse 


GCGGATCCCATATG-GATGACGTATCGGATTTT <SEQ ID 
990> 

CCCGCTCGAG-ATAGCCCGCTTTCAGG <SEQ ID 991> 


BamHI-Ndel 
Xhol 


ORE 58 


Forward 
Reverse 


CGCGGATCCGCTAGC-TCCGAACGCGAGTGGAT <SEQ ID 
992> 

CCCGCTCGAG-AGCATTGTCCAAGGGGAC <SEQ ID 993> 


BamHI-Miel 
Xhol 


ORF 65 


Forward 

Forward 
Reverse 


GGAATTCCATATGGCCATGG -TGCTGTATCTGAATCAAG <SEQ 
ID 994> 

CGGGATCC-TTGCTGTATCTGAATCAAGG <SEQ ID 995> 
CCCGCTCGAG-CCGCATCGGCAGACA <SEQ ID 996> 


Ndel-Ncol 

BamHI 

Xhol 


ORF 66 


Forward 
Reverse 


GCGGATCCCATATG-TACGCATTTACCGCCG <SEQ ID 997> 
CCCGCTCGAG-TGGATTTTGCAGAGATGG <SEQ ID 998> 


BamHI-Ndel 
Xhol 


ORF 72 


Forward 


CGCGGATCCCATATG- AATGCAGTAAAAATATCTGA <SEQ ID 
999> 


BamHI-Ndel 




Reverse 


CCCGCTCGAG-GCCTGAGACCTTTGCAA <SEQ ID 100 0> 


Xhol 


ORF 73 


Forward 
Reverse 


GCGGAT CCCAT ATG- AGAT T T T T CGGTAT CGG <SEQ ID 
1001> 

CCCGCTCGAG-TTCATCTTTTTCATGTTCG <SEQ ID 1002> 


BamHI-Ndel 
Xhol 


ORF 75 


Forward 
Reverse 


GCGGATCCCATATG- TCTGTCTTTCAAACGGC <SEQ ID 
1003> 

CCCGCTCGAG-TTTGTTTTTGCAAGACAG <SEQ ID 1004> 


BamHI-Ndel 
Xhol 


ORF 76 


Forward 
Reverse 


GAT C AG C TAG CCA TAT G - AAAC AG AAAAAAAC C G C <SEQ ID 
1005> 

CGGGATCC-TTACGGTTTGACACCGTT <SEQ ID 1006> 


Nhel-Ndel 
BamHI 


ORF 79 


Forward 
Reverse 


CGCGGAT CCCAT ATG-GTTTCCGCCGCCG <SEQ ID 1007> 
CCCGCTCGAG-GTGCTGATGCGCTTCG <SEQ ID 1008> 


BamHI-Ndel 
Xhol 


ORF 83 


Forward 
Reverse 


GCGGATCCCATATG-AAAACCCTGCTGCTGC <SEQ ID 100 9> 
CCCGCTCGAG-GCCGCCTTTGCGGC <SEQ ID 1010> 


BamHI-Ndel 
Xhol 


ORF 84 


Forward 
Reverse 


GCGGATCCCATATG-GCAGAGATCTGTTTG <SEQ ID 1011> 
CCCGCTCGAG-GTTTGCCGATCCGACCA <SEQ ID 1012> 


BamHI-Ndel 
Xhol 


ORF 85 


Forward 
Reverse 


CGCGGATCCCATATG- GCGGTTTGGGGCGGA <SEQ ID 
1013> 

CCCGCTCGAG-TCGGCGCGGCGGGC <SEQ ID 1014> 


BamHI-Ndel 
Xhol 


ORF 89 


Forward 

Forward 
Reverse 


GGAATTCCATATGGCCATGG-CCATACCTTCTTATCA <SEQ ID 
1015> 

CGGGATCC-GCCATACCTTCTTAT C AGAG <SEQ ID 101 6> 
CCCGCTCGAG-TTTTTTGCGATTAGAAAAAGC <SEQ ID 
1017> 


Ndel-Ncol 

BamHI 
Xhol 
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ORF 97 


Forward 
Reverse 


GCGGATCCCATATG-CATCCTGCCAGCGAAC <SEQ ID 1018> 
CCCGCTCGAG-TTCGCCTACGGTTTTTTG <SEQ ID 1019> 


BamHI-Ndel 
Xhol 


ORF 98 


Forward 
Reverse 


GCGGATCCCATATG-ACGGTAACTGCGG <SEQ ID 1020> 
CCCGCTCGAG-TTGTTGTTCGGGCAAATC <SEQ ID 1021> 


BamHI-Ndel 
Xhol 


ORF 100 


Forward 
Reverse 


GCGGATCCCATATG-TCGGGCATTTACACCG <SEQ ID 1022> 
CCCGCTCGAG-ACGGGTTTCGGCGGAA <SEQ ID 1023> 


BamHI-Ndel 
Xhol 


ORF 101 


Forward 
Reverse 


GCGGATCCCATATG-ATTTATCAAAGAAACCTC <SEQ ID 
1024> 

CCCGCTCGAG-TTTTCCGCCTTTCAATGT <SEQ ID 1025> 


BamHI-Ndel 
Xhol 


ORF 102 


Forward 
Reverse 


GCGGATCCCATATG-GCAGGGCTGTTTTACC <SEQ ID 102 6> 
CCCGCTCGAG-AAACGGTTTGAACACGAC <SEQ ID 1027> 


BamHI-Ndel 
Xhol 


ORF 103 


Forward 
Reverse 


G C G GAT C C CAT AT G - AAC CAC GAC AT C AC <SEQ ID 1028> 
CCCGCTCGAG-CAGCCACAGGACGGC <SEQ ID 102 9> 


BamHI-Ndel 
Xhol 


ORF 104 


Forward 
Reverse 


GCGGATCCCATATG-ACGTGGGGAACGC <SEQ ID 1030> 
CCCGCTCGAG-GCGGCGTTTGAACGGC <SEQ ID 1031> 


BamHI-Ndel 
Xhol 


ORF 105 


Forward 
Reverse 


GCGGATCCCATATG-ACCAAATTTCAAACCCCTC <SEQ ID 
1032> 

CCCGCTCGAG-TAAACGAATGCCGTCCAG <SEQ ID 103 3> 


BamHI-Ndel 
Xhol 


ORF 106 


Forward 
Reverse 


GCGGATCCCATATG-AGGATAACCGACGGCG <SEQ ID 1034> 
CCCGCTCGAG-TTTGTTCCCGATGATGTT <SEQ ID 1035> 


BamHI-Ndel 
Xhol 


ORF 109 


Forward 
Reverse 


GCGGATCCCATATG-GAAGATTTATATATAATACTCG <SEQ ID 
1036> 

CCCGCTCGAG-ATCAGCTTCGAACCGAAG <SEQ ID 1037> 


BamHI-Ndel 
Xhol 


ORF110 


Forward 
Reverse 


AAAGAATTC-ATGAGTAAATCCCGTAGATCTCCC <SEQ ID 
1038> 

AAACTGCAG-GGAAAACCACATCCGCACTCTGCC <SEQ ID 

1039> 


EcoRI 
PstI 


ORF111 


Forward 
Reverse 


AAAGAATTC-GCACCGCAAAAGGCAAAAACCGCA <SEQ ID 
1040> 

AAACTGCAG-TCTGCGCGT ITTCGGGCAGGGTGG <SEQ ID 
1041> 


EcoRI 
PstI 


ORF113 


Forward 
Reverse 


AAAGAATTC-ATGAACAAAACCCTCTATCGTGTGATTTTCAACCG 
<SEQ ID 1042> 

AAACTGCAG-TTACGAATGCCTGCTTGCTCGACCGTACTG <SEQ 
ID 1043> 


EcoRI 
PstI 


ORF115 


Forward 


AAAGAATTC-TTGCTTGTGCAAACAGAAAAAGACGG <SEQ ID 
1044> 

AAAAAAGTCGAC-CTATTTTTTAGGGGCJTTTGCITTGTTTGAAAAGCCTGCC 
<SEQ ID 1045> 


EcoRI 




Reverse 


Sail 
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ORF119 


Forward 
Reverse 


AAAGAAT T C - T ACAACAT GT ATCAGG AAAACC AAT AC CG < S EQ 
ID 1046> 

AAACTGCAG-TTATGAAAACAGGCGCAGGGCGGTTTTGCC <SEQ 
ID 1047> 


EcoRI 
PstI 


ORF120 


Forward 
Reverse 


AAAGAATTC-GCAAGGCTACCCCAATCCGCCGTG <SEQ ID 
1048> 

AAACTGCAG-CGGTTTGGCTGCCTGGCCGTTGAT <SEQ ID 
1049> 


EcoRI 
PstI 


UKr 1Z1 


Forward 
Reverse 


AAAGAATTC-GCCTTGGTCTGGCTGGTTTTCGC <SEQ ID 
1050> 

AAACTGCAG-TCATCCGCCACCCCACCTCGGCCATCCATC <SEQ 
ID 1051> 


EcoRI 
PstI 




Forward 
Reverse 


AAAAAAGTCGAC-ATGTC2TACCGCGCAAGCAGTTCTCC <SEQ 
ID 1052> 

AAACT G CAG - T C AG G AAC AC AAAC GAT G AC GAAT AT C C G TAT C 
<SEQ ID 1053> 


Sail 
PstI 


ORF125 


Forward 
Reverse 


AAAGAATTC-GCGCTGTTTTTTGCGGCGGCGTAT <SEQ ID 
1054> 

AAACTGCAG-CGCCGTTTCAAGACGAAAAAGTCG <SEQ ID 
1055> 


EcoRI 
PstI 


ORF126 


Forward 
Reverse 


AAAGAATTC - GCGGAAACGGTCGAAG <SEQ ID 105 6> 
AAACTGCAG-TTAATCTTGTCTTCCGATATAC <SEQ ID 
1057> 


EcoRI 
PstI 


ORF127 


Forward 
Reverse 


AAAGAATTC-ATGACTGATAATCGGGGGTTTACG <SEQ ID 
1058> 

AAAAAAGTCGAC-CTTAAGTAACTTGCAGTCCTTATC <SEQ ID 
1059> 


EcoRI 
Sail 


ORF128 


Forward 


AAAGAATTC-ATGCAAGCTGTCCGCTACAGGCC <SEQ ID 
1060> 

AAACTGCAG-CTATTGCAATGCGCCGCCGCGGGAATGITTGAGCAGGCG 
<SEQ ID 1061> 


EcoRI 




Reverse 


PstI 


ORF129 


Forward 
Reverse 


AAAGAATTC-ATGGATTTTCGTTTTGACATTATTTACGAATACCG 
<SEQ ID 1062> 

AAACTGCAG-TTATTTTTTGATGAAATTTTGGGGCGG <SEQ ID 
1063> 


EcoRI 
PstI 


ORF130 


Forward 
Reverse 


AAAGAATTC-GCAGTACTTGCCAT ITCTCGGTGCG <SEQ ID 
1064> 

AAACTGCAG-CTCCGGATCGTCTGTAAACGCATT <SEQ ID 
1065> 


EcoRI 
PstI 


ORF131 


Forward 
Reverse 


GCGGATCCCATATG-GAAATTCGGGCAATAAAAT <SEQ ID 
1066> 

CCCGCTCGAG-CCAGCGGACGCGTTC <SEQ ID 1067> 


BamHI-Ndel 
Xhol 


ORF132 


Forward 
Reverse 


GCGGAT CCCAT AT G - AAAGAAGCGGGGT T T G <SEQ ID 1068> 
CCCGCTCGAG-CCAATCTGCCAGCCGT <SEQ ID 10 69> 


BamHI-Ndel 
Xhol 


ORF 133 


Forward 


CGCGGATCCCATATG-GAAGATGCAGGGCGCG <SEQ ID 
1070> 


BamHI-Ndel 
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Reverse 


CCCGCTCGAG-AAACTTGTAGCTCATCGT <SEQ ID 1071> 


Xhol 


ORF 134 


Forward 
Reverse 


GCGGATCCCATATG-TCTGTGCAAGCAGTATTG <SEQ ID 
1072> 

CCCGCTCGAG-ATCCTGTGCCAATGCG <SEQ ID 1073> 


BamHI-Ndel 
Xhol 


ORF 135 


Forward 
Reverse 


1074> 

CCCGCTCGAG-AAATACCGCTGAGGATG <SEQ ID 1075> 


d arnxli-rN Qci 
Xhol 


ORF 136 


Forward 
Reverse 


CGCGGATCCGCTAGC-ATGAAGCGGCGTATAGCC <SEQ ID 
1076> 

CCCGCTCGAG-TTCCGAATATTTGGAACTTTT <SEQ ID 
1077> 


BamHI-Nhel 
Xhol 


ORF 137 


Forward 
Reverse 


CGCGGATCCCATATG-GGCACGGCGGGAAATA <SEQ ID 
1078> 

CCCGCTCGAG-ATAACGGTATGCCGCC <SEQ ID 107 9> 


BamHI-Ndel 
Xhol 


ORF 138 


Forward 
Reverse 


GCGGATCCCATATG-TTTCGTTTACAATTCAGGC <SEQ ID 
1080> 

CCCGCTCGAG-CGGCGTTTTATAGCGG <SEQ ID 1081> 


BamHI-Ndel 
Xhol 


ORF 139 


Forward 
Reverse 


GCGGATCCCATATG-GCTTTTTTGGCGGTAATG <SEQ ID 
1082> 

CCCGCTCGAG-TAACGTTTCCGTGCGTTT <SEQ ID 1083> 


BamHI-Ndel 
Xhol 


ORF 140 


Forward 
Reverse 


GCGGATCCCATATG-TTGCCCACAGGCAGC <SEQ ID 108 4> 
CCCGCTCGAG-GACGATGGCAAACAGC <SEQ ID 1085> 


BamHI-Ndel 
Xhol 


ORF 141 


Forward 
Reverse 


GCGGATCCCATATG-CCGTCTGAAGCAGTCT <SEQ ID 108 6> 
CCCGCTCGAG-ATCTGTTGTTTTTAAAATATT <SEQ ID 
1087> 


BamHI-Ndel 
Xhol 


ORF 142 


Forward 


GCGGATCCCATATG-GATAATTCTGGTAGTGAAG <SEQ ID 
1088> 

CCCGCTCGAG-AAACGTATAGCCTACCT <SEQ ID 108 9> 


BamHI-Ndel 




Reverse 


Xhol 


ORF 143 


Forward 
Reverse 


GCGGATCCCATATG-GATACCGCTTTGAACCT <SEQ ID 
1090> 

CCCGCTCGAG-AATGGCTTCCGCAATATG <SEQ ID 1091> 


BamHI-Ndel 
Xhol 


ORF 144 


Forward 
Reverse 


GCGGATCCCATATG - ACCTTTTTACAACGTTTGC <SEQ I D 
1092> 

CCCGCTCGAG-AGATTGTTGTTGTTTTTTCG <SEQ ID 10 93> 


rJ amJbii-JN del 
Xhol 


ORF 147 


Forward 
Reverse 


GCGGATCCCATATG-TCTGTCTTTCAAACGGC <SEQ ID 
1094> 

CCCGCTCGAG-TTTGTTTTTGCAAGACAG <SEQ ID 1095> 


BamHI-Ndel 
Xhol 



NB: 

— restriction sites are underlined 
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- for ORFs 1 10-130, where the ORF itself carries an EcoRl site {eg. ORF122), a Sail site 
was used in the forward primer instead. Similarly, where the ORF carries a Pstl site {eg. 
ORFs 115 and 127), a Sail site was used in the reverse primer. 

Oligos were synthesized by a Perkin Elmer 394 DNA/RNA Synthesizer, eluted from the columns 
5 in 2ml NH 4 OH, and deprotected by 5 hours incubation at 56°C. The oligos were precipitated by 
addition of 0.3M Na-Acetate and 2 volumes ethanol. The samples were then centrifuged and the 
pellets resuspended in either lOOul or 1ml of water. OD 260 was determined using a Perkin Elmer 
Lambda Bio spectophotometer and the concentration was determined and adjusted to 2-10pmol/jal. 

C) Amplification 

10 The standard PCR protocol was as follows: 50-200ng of genomic DNA were used as a template 
in the presence of 20-40pM of each oligo, 400-800uM dNTPs solution, lx PCR buffer (including 
1.5mM MgCl 2 ), 2.5 units TaqI DNA polymerase (using Perkin-Elmer AmpliTaQ, GIBCO 
Platinum, Pwo DNA polymerase, or Tahara Shuzo Taq polymerase). 

In some cases, PCR was optimsed by the addition of lOul DMSO or 50pd 2M betaine. 

1 5 After a hot start (adding the polymerase during a preliminary 3 minute incubation of the whole mix 
at 95 °C), each sample underwent a double-step amplification: the first 5 cycles were performed 
using as the hybridization temperature the one of the oligos excluding the restriction enzymes tail, 
followed by 30 cycles performed according to the hybridization temperature of the whole length 
oligos. The cycles were followed by a final 10 minute extension step at 72°C. 

20 The standard cycles were as follows: 





Denaturation 


Hybridisation 


Elongation 


First 5 cycles 


30 seconds 
95°C 


30 seconds 
50-55°C 


30-60 seconds 
72°C 


Last 30 cycles 


30 seconds 
95°C 


30 seconds 
65-70°C 


30-60 seconds 
72°C 



The elongation time varied according to the length of the ORF to be amplified. 
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The amplifications were performed using either a 9600 or a 2400 Perkin Elmer GeneAmp PCR 
System. To check the results, 1/10 of the amplification volume was loaded onto a 1-1.5% agarose 
gel and the size of each amplified fragment compared with a DNA molecular weight marker. 

The amplified DNA was either loaded directly on a 1% agarose gel or first precipitated with ethanol 
5 and resuspended in a suitable volume to be loaded on a 1% agarose gel. The DNA fragment 
corresponding to the right size band was then eluted and purified from gel, using the Qiagen Gel 
Extraction Kit, following the instructions of the manufacturer. The final volume of the DNA 
fragment was 30ul or 50ul of either water or lOmM Tris, pH 8.5. 

D) Digestion of PCR fragments 

10 The purified DNA corresponding to the amplified fragment was split into 2 aliquots and double- 
digested with: 

- NdeVXhol or NheVXhol for cloning into pET-2 lb+ and further expression of the protein 
as a C-terminus His-tag fusion 

- BamHI/XhoI or EcoRI/XhoI for cloning into pGEX-KG and further expression of the 
15 protein as N-terminus GST fusion. 

- For ORF 76, NheVBamHl for cloning into pTRC-HisA vector and further expression 
of the protein as N-terminus His-tag fusion. 

- EcoRI/PstI, EcoRI/Sall, Sall/PstI for cloning into pGex-His and further expression of 
the protein as N-terminus His-tag fusion 

20 Each purified DNA fragment was incubated (37°C for 3 hours to overnight) with 20 units of each 
restriction enzyme (New England Biolabs ) in a either 30 or 40ju.l final volume in the presence of 
the appropriate buffer. The digestion product was then purified using the QIAquick PCR 
purification kit, following the manufacturer's instructions, and eluted in a final volume of 30 or 
50ul of either water or lOmM Tris-HCl, pH 8.5. The final DNA concentration was determined by 

25 1 % agarose gel electrophoresis in the presence of titrated molecular weight marker. 
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E) Digestion of the cloning vectors (pET22B, pGEX-KG, pTRC-His A, and pGex-His) 

lOug plasmid was double-digested with 50 units of each restriction enzyme in 200ul reaction 
volume in the presence of appropriate buffer by overnight incubation at 37°C. After loading the 
whole digestion on a 1% agarose gel, the band corresponding to the digested vector was purified 
5 from the gel using the Qiagen QIAquick Gel Extraction Kit and the DNA was eluted in 50ul of 
lOmM Tris-HCl, pH 8.5. The DNA concentration was evaluated by measuring OD 260 of the sample, 
and adjusted to 50u.g/p.l. 1 ul of plasmid was used for each cloning procedure. 

The vector pGEX-His is a modified pGEX-2T vector carrying a region encoding six histidine 
residues upstream to the thrombin cleavage site and containing the multiple cloning site of the 
1 0 vector pTRC99 (Pharmacia). 

F) Cloning 

The fragments corresponding to each ORF, previously digested and purified, were ligated in both 
pET22b and pGEX-KG. In a final volume of 20ul, a molar ratio of 3 : 1 fragment/vector was ligated using 
0.5jal of NEB T4 DNA ligase (400 units/pi), in the presence of the buffer supplied by the 
15 manufacturer. The reaction was incubated at room temperature for 3 hours. In some experiments, 
ligation was performed using the Boheringer "Rapid Ligation Kit", following the manufacturer's 
instructions. 

In order to introduce the recombinant plasmid in a suitable strain, IOOjj.1 E. coli DH5 competent 
cells were incubated with the ligase reaction solution for 40 minutes on ice, then at 37°C for 3 
20 minutes, then, after adding 800pi LB broth, again at 37°C for 20 minutes. The cells were then 
centrifuged at maximum speed in an Eppendorf microfuge and resuspended in approximately 200ul 
of the supernatant. The suspension was then plated on LB ampicillin (lOOmg/ml ). 

The screening of the recombinant clones was performed by growing 5 randomly-chosen colonies 
overnight at 37°C in either 2ml (pGEX or pTC clones) or 5ml (pET clones) LB broth + 1 OOjag/ml 
25 ampicillin. The cells were then pelletted and the DNA extracted using the Qiagen QIAprep Spin 
Miniprep Kit, following the manufacturer's instructions, to a final volume of 30pl. 5jil of each 
individual miniprep (approximately lg ) were digested with either NdeVXhol or BamHIIXhol and 
the whole digestion loaded onto a 1-1.5% agarose gel (depending on the expected insert size), in 
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parallel with the molecular weight marker (1Kb DNA Ladder, GIBCO). The screening of the 
positive clones was made on the base of the correct insert size. 

For the cloning of ORFs 110, 111, 113, 115, 119, 122, 125 & 130, the double-digested PCR 
product was ligated into double-digested vector using EcoRl-Pstl cloning sites or, for ORFs 115 
5 & 127, EcoRl-SaK or, for ORF 122, Sali-Pstl. After cloning, the recombinant plasmids were 
introduced in the E.coli host W31 10. Individual clones were grown overnight at 37°C in L-broth 
with 50jxl/ml ampicillin. 

G) Expression 

Each ORF cloned into the expression vector was transformed into the strain suitable for expression 
10 of the recombinant protein product. lul of each construct was used to transform 30ul of E.coli 
BL21 (pGEX vector), E.coli TOP 10 (pTRC vector) or E.coli BL21-DE3 (pET vector), as described 
above. In the case of the pGEX-His vector, the same E.coli strain (W31 10) was used for initial 
cloning and expression. Single recombinant colonies were inoculated into 2ml LB+Amp 
(lOOug/ml), incubated at 37°C overnight, then diluted 1:30 in 20ml of LB+Amp (100|ig/ml) in 
15 100ml flasks, making sure that the OD 600 ranged between 0.1 and 0.15. The flasks were incubated 
at 30°C into gyratory water bath shakers until OD indicated exponential growth suitable for 
induction of expression (0.4-0.8 OD for pET and pTRC vectors; 0.8-1 OD for pGEX and pGEX- 
His vectors). For the pET, pTRC and pGEX-His vectors, the protein expression was induced by 
addition of ImM IPTG, whereas in the case of pGEX system the final concentration of IPTG was 
20 0.2mM. After 3 hours incubation at 30°C, the final concentration of the sample was checked by 
OD. In order to check expression, 1ml of each sample was removed, centrifuged in a microfuge, 
the pellet resuspended in PBS, and analysed by 12% SDS-PAGE with Coomassie Blue staining. 
The whole sample was centrifuged at 6000g- and the pellet resuspended in PBS for further use. 

H) GST-fusion proteins large-scale purification. 

25 A single colony was grown overnight at 37°C on LB+Amp agar plate. The bacteria were inoculated 
into 20ml of LB+Amp liquid colture in a water bath shaker and grown overnight. Bacteria were 
diluted 1:30 into 600ml of fresh medium and allowed to grow at the optimal temperature (20-3 7°C) 
to OD 550 0.8-1. Protein expression was induced with 0.2mM IPTG followed by three hours 
incubation. The culture was centrifuged at 8000rpm at 4°C. The supernatant was discarded and the 
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bacterial pellet was resuspended in 7.5ml cold PBS. The cells were disrupted by sonication on ice 
for 30 sec at 40W using a Branson sonifier B-15, frozen and thawed twice and centrifuged again. 
The supernatant was collected and mixed with 150ul Glutatione-Sepharose 4B resin (Pharmacia) 
(previously washed with PBS) and incubated at room temperature for 30 minutes. The sample was 
5 centrifuged at 700g for 5 minutes at 4°C. The resin was washed twice with 10ml cold PBS for 1 0 
minutes, resuspended in 1ml cold PBS, and loaded on a disposable column. The resin was washed 
twice with 2ml cold PBS until the flow-through reached OD 2g0 of 0.02-0.06. The GST-fusion 
protein was eluted by addition of 700ul cold Glutathione elution buffer (lOmM reduced 
glutathione, 50mM Tris-HCl) and fractions collected until the OD 2g0 was 0.1. 21jxl of each fraction 
10 were loaded on a 12% SDS gel using either Biorad SDS-PAGE Molecular weight standard broad 
range (Ml) (200, 1 16.25, 97.4, 66.2, 45, 31, 21.5, 14.4, 6.5 kDa) or Amersham Rainbow Marker 
(M2) (220, 66, 46, 30, 21.5, 14.3 kDa) as standards. As the MW of GST is 26kDa, this value must 
be added to the MW of each GST-fusion protein. 

I) His-fusion solubility analysis (ORFs 111-129) 

15 To analyse the solubility of the His-fusion expression products, pellets of 3ml cultures were 
resuspended in buffer Ml [500ul PBS pH 7.2]. 25 ul lysozyme (lOmg/ml) was added and the 
bacteria were incubated for 15 min at 4°C. The pellets were sonicated for 30 sec at 40W using a 
Branson sonifier B-15, frozen and thawed twice and then separated again into pellet and 
supernatant by a centrifugation step. The supernatant was collected and the pellet was resuspended 

20 in buffer M2 [8M urea, 0.5M NaCl, 20mM imidazole and 0. 1M NaH 2 P0 4 ] and incubated for 3 to 
4 hours at 4°C. After centrifugation, the supernatant was collected and the pellet was resuspended 
in buffer M3 [6M guanidinium-HCl, 0.5M NaCl, 20mM imidazole and 0.1M NaH 2 P0 4 ] overnight 
at 4°C. The supernatants from all steps were analysed by SDS-PAGE. 

The proteins expressed from ORFs 113, 119 and 120 were found to be soluble in PBS, whereas 
25 ORFs 111, 122, 126 and 129 need urea and ORFs 125 and 127 need guanidium-HCl for their 
solubilization. 

J) His-fusion large-scale purification. 

A single colony was grown overnight at 37°C on a LB + Amp agar plate. The bacteria were 
inoculated into 20ml of LB+Amp liquid culture and incubated overnight in a water bath shaker. 
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Bacteria were diluted 1:30 into 600ml fresh medium and allowed to grow at the optimal 
temperature (20-37°C) to OD 550 0.6-0.8. Protein expression was induced by addition of ImM IPTG 
and the culture further incubated for three hours. The culture was centrifuged at 8000rpm at 4°C, 
the supernatant was discarded and the bacterial pellet was resuspended in 7.5ml of either (i) cold 
5 buffer A (300mM NaCl, 50mM phosphate buffer, lOmM imidazole, pH 8) for soluble proteins or 
(ii) buffer B (urea 8M, lOmM Tris-HCl, lOOmM phosphate buffer, pH 8.8) for insoluble proteins. 

The cells were disrupted by sonication on ice for 30 sec at 40W using a Branson sonifier B-15, 
frozen and thawed two times and centrifuged again. 

For insoluble proteins, the supernatant was stored at -20°C, while the pellets were resuspended in 
10 2ml buffer C (6M guanidine hydrochloride, lOOmM phosphate buffer, lOmM Tris-HCl, pH 7.5) 
and treated in a homogenizer for 10 cycles. The product was centrifuged at 13000rpm for 40 
minutes. 

Supernatants were collected and mixed with 150ul Ni 2+ -resin (Pharmacia) (previously washed with 
either buffer A or buffer B, as appropriate) and incubated at room temperature with gentle agitation 
15 for 30 minutes. The sample was centrifuged at 700g for 5 minutes at 4°C. The resin was washed 
twice with 10ml buffer A or B for 10 minutes, resuspended in 1ml buffer A or B and loaded on a 
disposable column. The resin was washed at either (i) 4°C with 2ml cold buffer A or (ii) room 
temperature with 2ml buffer B, until the flow-through reached OD 280 of 0.02-0.06. 

The resin was washed with either (i) 2ml cold 20mM imidazole buffer (300mM NaCl, 50mM 
20 phosphate buffer, 20mM imidazole, pH 8) or (ii) buffer D (urea 8M, lOmM Tris-HCl, lOOmM 
phosphate buffer, pH 6.3) until the flow-through reached the O.D 280 of 0.02-0.06. The His-fusion 
protein was eluted by addition of 700ul of either (i) cold elution buffer A (300mM NaCl, 50mM 
phosphate buffer, 250mM imidazole, pH 8) or (ii) elution buffer B (urea 8M, lOmM Tris-HCl, 
lOOmM phosphate buffer, pH 4.5) and fractions collected until the O.D 2g0 was 0.1. 21ul of each 
25 fraction were loaded on a 12% SDS gel. 

K) His-fusion proteins renaturation 

10% glycerol was added to the denatured proteins. The proteins were then diluted to 20ug/ml using 
dialysis buffer I (10% glycerol, 0.5M arginine, 50mM phosphate buffer, 5mM reduced glutathione, 
0.5mM oxidised glutathione, 2M urea, pH 8.8) and dialysed against the same buffer at 4°C for 12- 
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14 hours. The protein was further dialysed against dialysis buffer II (10% glycerol, 0.5M arginine, 
50mM phosphate buffer, 5mM reduced glutathione, 0.5mM oxidised glutathione, pH 8.8) for 12-14 
hours at 4°C. Protein concentration was evaluated using the formula: 

Protein (mg/ml) = (1.55 x OD 280 ) - (0.76 x OD 260 ) 
5 L) His-fusion large-scale purification (ORFs 111-129) 

500ml of bacterial cultures were induced and the fusion proteins were obtained soluble in buffer 
Ml, M2 or M3 using the procedure described above. The crude extract of the bacteria was loaded 
onto a Ni-NTA superflow column (Quiagen) equilibrated with buffer Ml, M2 or M3 depending 
on the solubilization buffer of the fusion proteins. Unbound material was eluted by washing the 
10 column with the same buffer. The specific protein was eluted with the corresponding buffer 
containing 500mM imidazole and dialysed against the corresponding buffer without imidazole. 
After each run the columns were sanitized by washing with at least two column volumes of 0.5 M 
sodium hydroxide and reequilibrated before the next use. 

M) Mice immunisations 

1 5 20ug of each purified protein were used to immunise mice intraperitoneally . In the case of ORFs 
2, 4, 15, 22, 27, 28, 37, 76, 89 and 97, Balb-C mice were immunised with Al(OH) 3 as adjuvant on 
days 1, 21 and 42, and immune response was monitored in samples taken on day 56. For ORFs 44, 
106 and 132, CD1 mice were immunised using the same protocol. For ORFs 25 and 40, CD1 mice 
were immunised using Freund's adjuvant, rather than AL(OH) 3 , and the same immunisation 

20 protocol was used, except that the immune response was measured on day 42, rather than 56. 
Similarly, for ORFs 23, 32, 38 and 79, CD1 mice were immunised with Freund's adjuvant, but the 
immune response was measured on day 49. 

N) ELISA assay (sera analysis) 

The acapsulated MenB M7 strain was plated on chocolate agar plates and incubated overnight at 
25 37°C. Bacterial colonies were collected from the agar plates using a sterile dracon swab and 
inoculated into 7ml of Mueller-Hinton Broth (Difco) containing 0.25% Glucose. Bacterial growth 
was monitored every 30 minutes by following OD 620 . The bacteria were let to grow until the OD 
reached the value of 0.3-0.4. The culture was centrifuged for 10 minutes at lOOOOrpm. The 
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supernatant was discarded and bacteria were washed once with PBS, resuspended in PBS 
containing 0.025% formaldehyde, and incubated for 2 hours at room temperature and then 
overnight at 4°C with stirring. IOOjj.1 bacterial cells were added to each well of a 96 well Greiner 
plate and incubated overnight at 4°C. The wells were then washed three times with PBT washing 
5 buffer (0.1% Tween-20 in PBS). 200p,l of saturation buffer (2.7% Polyvinylpyrrolidone 10 in 
water) was added to each well and the plates incubated for 2 hours at 37°C. Wells were washed 
three times with PBT. 200pi of diluted sera (Dilution buffer: 1% BSA, 0.1% Tween-20, 0.1% NaN 3 
in PBS) were added to each well and the plates incubated for 90 minutes at 37°C. Wells were 
washed three times with PBT. lOOpl of HRP-conjugated rabbit anti-mouse (Dako) serum diluted 

10 1 :2000 in dilution buffer were added to each well and the plates were incubated for 90 minutes at 
37°C. Wells were washed three times with PBT buffer. lOOul of substrate buffer for HRP (25ml 
of citrate buffer pH5, lOmg of O-phenildiamine and 10ul of H 2 0) were added to each well and the 
plates were left at room temperature for 20 minutes. lOOul H 2 S0 4 was added to each well and OD 490 
was followed. The ELISA was considered positive when OD 490 was 2.5 times the respective 

15 pre-immune sera. 

O) FACScan bacteria Binding Assay procedure. 

The acapsulated MenB M7 strain was plated on chocolate agar plates and incubated overnight at 
37°C. Bacterial colonies were collected from the agar plates using a sterile dracon swab and 
inoculated into 4 tubes containing 8ml each Mueller-Hinton Broth (Difco) containing 0.25% 

20 glucose. Bacterial growth was monitored every 30 minutes by following OD 620 . The bacteria were 
let to grow until the OD reached the value of 0.35-0.5. The culture was centrifuged for 10 minutes 
at 4000rpm. The supernatant was discarded and the pellet was resuspended in blocking buffer (1% 
BSA, 0.4% NaN 3 ) and centrifuged for 5 minutes at 4000rpm. Cells were resuspended in blocking 
buffer to reach OD 620 of 0.07. lOOpl bacterial cells were added to each well of a Costar 96 well 

25 plate. IOOjaI of diluted (1:200) sera (in blocking buffer) were added to each well and plates 
incubated for 2 hours at 4°C. Cells were centrifuged for 5 minutes at 4000rpm, the supernatant 
aspirated and cells washed by addition of 200ul/well of blocking buffer in each well. 100ul of R- 
Phicoerytrin conjugated F(ab) 2 goat anti-mouse, diluted 1:100, was added to each well and plates 
incubated for 1 hour at 4°C. Cells were spun down by centrifugation at 4000rpm for 5 minutes and 

30 washed by addition of 200p.l/well of blocking buffer. The supernatant was aspirated and cells 
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resuspended in 200ul/well of PBS, 0.25% formaldehyde. Samples were transferred to FACScan 
tubes and read. The condition for FACScan setting were: FL1 on, FL2 and FL3 off; FSC-H 
threshold:92; FSC PMT Voltage: E 02; SSC PMT: 474; Amp. Gains 7.1; FL-2 PMT: 539; 
compensation values: 0. 

5 P) OMV preparations 

Bacteria were grown overnight on 5 GC plates, harvested with a loop and resuspended in 10 ml 
20mM Tris-HCl. Heat inactivation was performed at 56°C for 30 minutes and the bacteria disrupted 
by sonication for 10 minutes on ice (50% duty cycle, 50% output). Unbroken cells were removed 
by centrifugation at 5000g for 10 minutes and the total cell envelope fraction recovered by 

10 centrifugation at 50000g at 4°C for 75 minutes. To extract cytoplasmic membrane proteins from 
the crude outer membranes, the whole fraction was resuspended in 2% sarkosyl (Sigma) and 
incubated at room temperature for 20 minutes. The suspension was centrifuged at lOOOOg for 10 
minutes to remove aggregates, and the supernatant further ultracentrifuged at 50000g for 75 
minutes to pellet the outer membranes. The outer membranes were resuspended in lOmM Tris-HCl, 

15 pH8 and the protein concentration measured by the Bio-Rad Protein assay, using BSA as a 
standard. 

Q) Whole Extracts preparation 

Bacteria were grown overnight on a GC plate, harvested with a loop and resuspended in 1ml of 
20mM Tris-HCl. Heat inactivation was performed at 56°C for 30 minutes. 

20 R) Western blotting 

Purified proteins (500ng/lane), outer membrane vesicles (5jj,g) and total cell extracts (25u.g) derived 
from MenB strain 2996 were loaded on 15% SDS-PAGE and transferred to a nitrocellulose 
membrane. The transfer was performed for 2 hours at 150mA at 4°C, in transferring buffer (0.3 % 
Tris base, 1.44 % glycine, 20% methanol). The membrane was saturated by overnight incubation 
25 at 4°C in saturation buffer (1 0% skimmed milk, 0. 1 % Triton XI 00 in PBS). The membrane was 
washed twice with washing buffer (3% skimmed milk, 0.1% Triton X100 in PBS) and incubated 
for 2 hours at 37°C with mice sera diluted 1:200 in washing buffer. The membrane was washed 
twice and incubated for 90 minutes with a 1:2000 dilution of horseradish peroxidase labelled anti- 
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mouse Ig. The membrane was washed twice with 0.1% Triton XI 00 in PBS and developed with 
the Opti-4CN Substrate Kit (Bio-Rad). The reaction was stopped by adding water. 

S) Bactericidal assay 

MC58 strain was grown overnight at 37°C on chocolate agar plates. 5-7 colonies were collected and 
5 used to inoculate 7ml Mueller-Hinton broth. The suspension was incubated at 37°C on a nutator 
and let to grow until OD 620 was 0.5-0.8. The culture was aliquoted into sterile 1.5ml Eppendorf 
tubes and centrifuged for 20 minutes at maximum speed in a microfuge. The pellet was washed 
once in Gey's buffer (Gibco) and resuspended in the same buffer to an OD 620 of 0.5, diluted 
1:20000 in Gey's buffer and stored at 25°C. 

10 50ul of Gey's buffer/1% BSA was added to each well of a 96-well tissue culture plate. 25ul of 
diluted mice sera (1:100 in Gey's buffer/0.2% BSA) were added to each well and the plate 
incubated at 4°C. 25^.1 of the previously described bacterial suspension were added to each well. 
25ul of either heat-inactivated (56°C waterbath for 30 minutes) or normal baby rabbit complement 
were added to each well. Immediately after the addition of the baby rabbit complement, 22ul of 

15 each sample/well were plated on Mueller-Hinton agar plates (time 0). The 96-well plate was 
incubated for 1 hour at 37°C with rotation and then 22ul of each sample/well were plated on 
Mueller-Hinton agar plates (time 1). After overnight incubation the colonies corresponding to time 
0 and time 1 hour were counted. 

Table II gives a summary of the cloning, expression and prurification results. 
20 TABLE II - Summary of cloning, expression and purification 



ORF 


PCR/cloning 


His-fusion 
expression 


GST-fusion 
expression 


Purification 


orf 1 


+ 


+ 


+ 


His-fusion 


orf2 


+ 


+ 


+ 


GST-fusion 


orf 2.1 


+ 


n.d. 


+ 


GST-fusion 


orf 4 


+ 


+ 


+ 


His-fusion 


orf 5 


+ 


n.d. 


+ 


GST-fusion 


orf 6 


+ 


+ 


+ 


GST-fusion 


orf 7 


+ 


+ 


+ 


GST-fusion 


orf 8 


+ 


n.d. 


n.d. 
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orf9 


+ 


+ 


+ 


GST-fiision 


orf 10 


+ 


n.d. 


n.d. 




orfll 


+ 


n.d. 


n.d. 




orf 13 


+ 


n.d. 


+ 


GST-fosion 


orf 15 


+ 


+ 


+ 


GST-fusion 


orf 17 


+ 


n.d. 


n.d. 




orf 18 


+ 


n.d. 


n.d. 




orf 19 


+ 


n.d. 


n.d. 




orf 20 


+ 


n.d. 


n.d. 




orf 22 


+ 


+ 


+ 


GST-fusion 


orf 23 


+ 


+ 


+ 


His-fusion 


orf 24 


+ 


n.d. 


n.d. 




orf 25 


+ 


+ 


+ 


His-fusion 


orf 26 


+ 


n.d. 


n.d. 




orf 27 


+ 


+ 


+ 


GST-fusion 


orf 28 


+ 


+ 


+ 


GST-fusion 


orf 29 


+ 


n.d. 


n.d. 




orf 32 


+ 


+ 


+ 


His-fusion 


orf 33 


+ 


n.d. 


n.d. 




orf 35 


+ 


n.d. 


n.d. 




orf 37 


+ 


+ 


+ 


GST-fusion 


orf 58 


+ 


n.d. 


n.d. 




orf 65 


+ 


n.d. 


n.d. 




orf 66 


+ 


n.d. 


n.d. 




orf 72 


+ 


+ 


n.d. 


His-fusion 


orf 73 


+ 


n.d. 


+ 


n.d. 


orf 75 


+ 


n.d. 


n.d. 




orf 76 


+ 


+ 


n.d. 


His-fusion 


orf 79 


+ 


+ 


n.d. 


His-fusion 


orf 83 


+ 


n.d. 


+ 


n.d. 


orf 84 


+ 


n.d. 


n.d. 




orf 85 


+ 


n.d. 


+ 


GST-fusion 


orf 89 


+ 


n.d. 


+ 


GST-fusion 


orf 97 


+ 


+ 


+ 


GST-fusion 


orf 98 


+ 


n.d. 


n.d. 




orf 100 


+ 


n.d. 


n.d. 




orf 101 


+ 


n.d. 


n.d. 




orf 102 


+ 


n.d. 


n.d. 




orf 103 


+ 


n.d. 


n.d. 




orf 104 


+ 


n.d. 


n.d. 




orf 105 


+ 


n.d. 


n.d. 




orf 106 


+ 


+ 


+ 


His-fusion 


orf 109 


+ 


n.d. 


n.d. 




orf 110 


+ 


n.d. 


n.d. 
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orf 111 


+ 


+ 


n.d. 


His-fusion 


orf 113 


+ 


+ 


n.d. 


His-fusion 


orf 115 


n.d. 


n.d. 


n.d. 




orf 119 


+ 


+ 


n.d. 


His-fiision 


orf 120 


+ 


+ 


n.d. 


His-fusion 


orf 121 


+ 


n.d. 


n.d. 




orf 122 


+ 


+ 


n.d. 


His-fusion 


orf 125 


+ 


+ 


n.d. 


His-fusion 


orf 126 


+ 


+ 


n.d. 


His-fusion 


orf 127 


+ 


+ 


n.d. 


His-fusion 


orf 128 


+ 


n.d. 


n.d. 




orf 129 


+ 


+ 


n.d. 


His-fusion 


orf 130 


+ 


n.d. 


n.d. 




orf 131 


+ 


+ 


+ 


n.d. 


orf 132 


+ 


+ 


+ 


His-fusion 


orf 133 


+ 


n.d. 


+ 


GST-fusion 


orf 134 


+ 


n.d. 


n.d. 




orf 135 


+ 


n.d. 


n.d. 




orf 136 


+ 


n.d. 


n.d. 




orf 137 


+ 


n.d. 


+ 


GST-fusion 


orf 138 


+ 


n.d. 


+ 


GST-fusion 


orf 139 


+ 


n.d. 


n.d. 




orf 140 


+ 


n.d. 


n.d. 




orf 141 


+ 


n.d. 


n.d. 




orf 142 


+ 


n.d. 


n.d. 




orf 143 


+ 


n.d. 


n.d. 




orf 144 


+ 


n.d. 


+ 


n.d. 


orf 147 


+ 


n.d. 


n.d. 





Example 1 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 1>: 

1 ATGAAACAGA CAGTCAA.AT GCTTGCCGCC GCCCTGATTG CCTTGGGCTT 

51 GAACCGACCG GTGTGGNCGG ATGACGTATC GGATTTTCGG GAAAACTTGC 

101 A . GCGGCAGC ACAGGGAAAT GCAGCAGCCC AATACAATTT GGGCGCAATG 

151 TAT . TACAAA GGACGCGCGT GCGCCGGGAT GATGCTGAAG CGGTCAGATG 

201 GTATCGGCAG CCGGCGGAAC AGGGGTTAGC CCAAGCCCAA TACAATTTGG 

251 GCTGGATGTA TGCCAACGGG CGCGC . GTGC GCCAAGATGA TACCGAAGCG 

301 GTCAGATGGT ATCGGCAGGC GGCAGCGCAG GGGGTTGTCC AAGCCCAATA 

351 CAATTTGGGC GTGATATATG CCGAAGGACG TGGAGTGCGC CAAGACGATG 

4 01 TCGAAGCGGT CAGATGGTTT CGGCAGGCGG CAGCGCAGGG GGTAGCCCAA 

4 51 GCCCAAAACA ATTTGGGCGT GATGTATGCC GAAAGANCGC GCGT GCGCCA 

501 AGACCG... 



This corresponds to the amino acid sequence <SEQ ED 2; ORF37>: 
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1 MKQTVXMLAA ALIALGLNRP VWXDDVSDFR ENLXAAAQGN AAAQYNLGAM 

51 YXQRTRVRRD DAEAVRWYRQ PAEQGLAQAQ YNLGWMYANG RXVRQDDTEA 

101 VRWYRQAAAQ GVVQAQYNLG VIYAEGRGVR QDDVEAVRWF RQAAAQGVAQ 

151 AQNNLGVMYA ERXRVRQD. . . 

5 Further work revealed the complete nucleotide sequence <SEQ ID 3>: 



1 ATGAAACAGA CAGTCAAATG GCTTGCCGCC GCCCTGATTG CCTTGGGCTT 

51 GAACCGAGCG GTGTGGGCGG ATGACGTATC GGATTTTCGG GAAAACTTGC 

101 AGGCGGCAGC ACAGGGAAAT GCAGCAGCCC AATACAATTT GGGCGCAATG 

151 TATTACAAAG GACGCGGCGT GCGCCGGGAT GATGCTGAAG CGGTCAGATG 

201 GT AT CGGCAG GCGGCGGAAC AGGGGTTAGC CCAAGCCCAA TACAATTTGG 

251 GCTGGATGTA TGCCAACGGG CGCGGCGTGC GCCAAGATGA TACCGAAGCG 

301 GTCAGATGGT ATCGGCAGGC GGCAGCGCAG GGGGTTGTCC AAGCCCAATA 

351 CAATTTGGGC GTGATATATG CCGAAGGACG TGGAGTGCGC CAAGACGATG 

4 01 TCGAAGCGGT CAGATGGTTT CGGCAGGCGG CAGCGCAGGG GGTAGCCCAA 

451 GCCCAAAACA ATTTGGGCGT GATGTATGCC GAAAGACGCG GCGTGCGCCA 

501 AGACCGCGCC CTTGCACAAG AATGGTTTGG CAAGGCTTGT CAAAACGGAG 

551 ACCAAGACGG CTGCGACAAT GACCAACGCC TGAAGGCGGG TTATTGA 

This corresponds to the amino acid sequence <SEQ ID 4; ORF37-l>: 



1 MKQTVKWLAA ALIALGLNRA VWA DDVSDFR ENLQAAAQGN AAAQYNLGAM 

20 51 YYKGRGVRRD DAEAVRWYRQ AAEQGLAQAQ YNLGWMYANG RGVRQDDTEA 

101 VRWYRQAAAQ GVVQAQYNLG VIYAEGRGVR QDDVEAVRWF RQAAAQGVAQ 

151 AQNNLGVMYA ERRGVRQDRA LAQEWFGKAC QNGDQDGCDN DQRLKAGY* 

Further work identified the corresponding gene in strain A of N meningitidis <SEQ ID 5>: 



1 AT GAAACAGA CAGTCAAATG GCTTGCCGCC GCCCTGATTG CCTTGGGCTT 

25 51 GAACCAAGCG GTGTGGGCGG ATGACGTATC GGATTTTCGG GAAAACTTGC 

101 AGGCGGCAGC ACAGGGAAAT GCAGCAGCCC AAAACAATTT GGGCGTGATG 

151 TATGCCGAAA GACGCGGCGT GCGCCAAGAC CGCGCCCTTG CACAAGAATG 

201 GCTTGGCAAG GCTTGTCAAA ACGGATACCA AGACAGCTGC GACAATGACC 

251 AACGCCTGAA AGCGGGTTAT TGA 

30 This encodes a protein having amino acid sequence <SEQ ID 6; ORF37a>: 

1 MKQTVKWLAA ALIALGLNQA VWA DDVSDFR ENLQAAAQGN AAAQNNLGVM 

51 YAERRGVRQD RALAQEWLGK ACQNGYQDSC DNDQRLKAGY * 



The originally-identified partial strain B sequence (ORF37) shows 68.0% identity over a 75 aa 
3 5 overlap with ORF3 7a: 



10 20 30 40 50 60 

orf 37 .pep MKQTVXMLAAALIALGLNRPVWX DDVSDFRENLXAAAQGNAAAQYNLGAMYXORTRVRRD 
I I I I 1 i I I M I I I I II : II I I I I I I I 1 I I I I I I I I I I I I I II : I I : I I I : I 
or f 3 7 a MKQTVKWLAAALIALGLNQAVWA DDVSDFRENLQAAAQGNAAAQNNLGVMYAERRGVROD 
40 10 20 30 40 50 60 

70 80 90 100 110 120 

or f 37 . pep DAEAVRWYRQ PAEQGLAQAQ YNLGWMYANGRXVRQDDTEAVRWYRQAAAQGVVQAQYNLG 
I I : I : : : I 

4J orf37a RALAQEWLGKACQNGYQD S CDNDQRLKAGYX 

70 80 90 

Further work identified the corresponding gene in TV. gonorrhoeae <SEQ ID 7 >: 



1 AT GAAACAGA CAGTCAAATG GCTTGCCGCC GCCCTGATTG CCTTGGGCTT 

51 GAACCAAGCG GTGTGGGCGG GTGACGTATC GGATTTTCGG GAAAACTTGC 

101 AGgcggcaGA ACaggGAAAT GCAGCAGCCC AATTCAATTT GGGCGTGATG 

151 TAT GAAAAT G GACAAGGAGT TCGTCAAGAT TATGTACAGG CAGTGCAGTG 

2 01 GTATCGCAAG GCTTCAGAAC AAGGGGATGC CCAAGCCCAA TACAATTTGG 

251 GCTTGATGTA TTACGATGGA CGCGGCGTGC GCCAAGACCT TGCGCTCGCT 

301 CAACAATGGC TTGGCAAGGC TTGTCAAAAC GGAGACCAAA ACAGCTGCGA 

351 CAATGACCAA CGCCTGAAGG CGGGTTATTA A 
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20 



This encodes a protein having amino acid sequence <SEQ ID 8; ORF37ng>: 



101 



MKQTVKWLAA ALIALGLNQA VWA GDVSDFR ENLQAAEQGN AAAQFNLGVM 
YENGQGVRQD YVQAVQWYRK ASEQGDAQAQ YNLGLMYYDG RGVRQDLALA 
QQWLGKACQN GDQNSCDNDQ RLKAGY* 



The originally-identified partial strain B sequence (ORF37) shows 64.9% identity over a 1 1 laa 
overlap with ORF37ng: 

orf 37 .pep 

orf37ng 



orf 37 .pep 
orf37ng 
orf 37. pep 
orf37ng 



MKQTVXMLAAALIALGLNRPVWXDDVSDFRENLXAAAQGNAAAQYNLGAMYXQRTRVRRD 
Mill I I I I I I I I I 1 I : II ! 1 I I II I I I II I II I I I I : I I I : I I : M : I 
MKQTVKWLAAALIALGLNQAVWAGDVSDFRENLQAAEQGNAAAQFNLGVMYENGQGVRQD 

DAEAVRWYRQPAEQGLAQAQYNLGWMYANGRXVRQDDTEAVRWYRQAAAQGWQAQYNLG 



: I I : I I 



I I I 



I I 



: I 



: I 



YVQAVQWYRKASEQGDAQAQYNLGLMYYDGRGVRQDLALAQQWLGKACQNGDQNSCDNDQ 
VIYAEGRGVRQDDVEAVRWFRQAAAQGVAQAQNNLGVMYAERXRVRQD 168 
RLKAGY 12 6 



120 
120 



The complete strain B sequence (ORF37-1) and OKF37ng show 51 .5% identity in 198 aa overlap: 



orf 37-1 . pep 
orf37ng 



MKQTVKWLAAALIALGLNRAVWADDVSDFRENLQAAAQGNAAAQYNLGAMYYKGRGVRRD 
M I I I I M I I I I I I M I I : I I I I I I I I I I I I I I I I I I I I I I I : I I I : I I : | : | | | : | 

MKQTVKWLAAALIALGLNQAVWAGDVSDFRENLQAAEQGNAAAQFNLGVMYENGQGVRQD 



70 80 90 100 110 120 

orf 37-1. pep DAEAVRWYRQAAEQGLAQAQYNLGWMYANGRGVRQDDTEAVRWYRQAAAQGWQAQYNLG 

orf37ng YVQAVQWYRKASEQGDAQAQYNLGLMYYDGRGVRQD 



orf 37-1 .pep 
orf 37ng 



130 140 150 160 170 180 

VIYAEGRGVRQDDVEAVRWFRQAAAQGVAQAQNNLGVMYAERRGVRQDRALAQEWFGKAC 



orf37-l .pep 
orf 37ng 



190 199 
QNGDQDGCDNDQRLKAGYX 
I I I I I : : I M M I I I I M I 
QNGDQNSC DNDQRLKAGYX 
110 120 



Computer analysis of these amino acid sequences indicates a putative leader sequence, and it was 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

45 ORF37-1 (llkDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
1A shows the results of affinity purification of the GST-fusion protein, and Figure IB shows the 
results of expression of the His-fusion in E.coli. Purified GST-fusion protein was used to imrnunise 
mice, whose sera were used for ELISA (positive result), FACS analysis (Figure 1C), and a 
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bactericidal assay (Figure ID). These experiments confirm that ORF37-1 is a surface-exposed 
protein, and that it is a useful immunogen. 

Figure IE shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF37-1. 
Example 2 

5 The following partial DNA sequence was identified in N. meningitidis <SEQ ID 9>: 

TTCGGCGA CATCGGCGGT TTGAAGGTCA ATGCCCCCGT CAAATCCGCA 
GGCGTATTGG TCGGGCGCGT CGGCGCTATC GGACTTGACC CGAAATCCTA 
TCAGGCGAGG GTGCGCCTCG ATTTGGACGG CAAGTATCAG TTCAGCAGCG 
ACGTTTCCGC GCAAATCCTG ACTTCsGGAC TTTTGGGCGA GCAGTACATC 
10 GGGCTGCAGC AGGGCGGCGA CACGGAAAAC CTTGCTGCCG GCGACACCAT 

CTCCGTAACC AGTTCTGCAA TGGTTCTGGA AAACCTTATC GGCAAATTCA 
TGACGAGTTT TGCCGAGAAA AATGCCGACG GCGGCAATGC GGAAAAAGCC 
GCCGAATAA 

This corresponds to the amino acid sequence <SEQ ID 10>: 

15 1 FGDIGGLKVN APVKSAGVLV GRVGAIGLDP KSYQARVRLD LDGKYQFSSD 

51 VSAQILTSGL LGEQYIGLQQ GGDTENLAAG DTISVTSSAM VLENLIGKFM 
101 TSFAEKNADG GNAEKAAE* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a hypothetical H.influenzae protein (ybrd.haein; accession number p45029) 
20 SEQ ID 9 and ybrd.haein show 48.4% aa identity in 122 aa overlap: 

20 30 40 50 60 70 

yrbd . h LGIGALVFLGLRVANVQGFAETKSYTVTATFDNIGGLKVRAPLKIGGVVIGRVSAITLDE 

I : : I I I II I : I I : I : I I : : I I I : ! i : II 
N . m FGD I GGLKVNAPVKSAGVLVGRVGAI GLDP 

25 10 20 30 

80 90 100 110 120 130 

yrbd.h KSYLPKVSIAINQEYNEIPENSSLSIKTSGLLGEQYIALTMGFDDGDTAMLKNGSQIQDT 
, n I I I : : I : : : : : : I : : : : : I I II I I I I I I I I : I I I I I : I : I : | | 

JVJ N .m KSYQARVRLDLDGKY-QFSSDVSAQILTSGLLGEQYIGLQQG GDTENLAAGDTISVT 

40 50 60 70 80 

140 150 160 

yrbd. h TSAMVLEDLIGQFL— YGSKKSDGNEKSESTEQ 
35 : I I I M I : I I I : I : : : : I : : II :::::: | : 

N.m SSAMVLENLIGKFMTSFAEKNADGGNAEKAAEX 
90 100 110 120 

Homology with a predicted ORF from N.gonorrhoeae 
40 SEQ ID 9 shows 99.2% identity over a 1 1 8aa overlap with a predicted ORF from N. gonorrhoeae: 

20 30 40 50 60 70 

Y rbd GAAAVAFLAFRVAGGAAFGGSDKTYAVYADFGDIGGLKVNAPVKSAGVLVGRVGAIGLDP 

I M I I M II M I I I I I II M I I I I I I I I I I 

d< - N - m FGD I GGLKVNAPVKSAGVLVGRVGAI GLDP 

^° 10 20 30 

80 90 100 110 120 130 

yrbd KSYQARVRLDLDGKYQFSSDVSAQILTSGLLGEQYIGLQQGGDTENLAAGDTISVTSSAM 

, n I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I II M 

3U N ' m KSYQARVRLDLDGKYQFSSDVSAQILTSGLLGEQYIGLQQGGDTENLAAGDTISVTSSAM 

40 50 60 70 80 90 
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140 150 160 

yrbd VLENL I GKFMT S FAEKNAEGGNAEKAAEX 

I I 1 I I I I I I I I I I I i I I I : 1 I M I I I I I I 
N.m VLENLIGKFMTSFAEKNADGGNAEKAAEX 
100 110 120 

The complete yrbd H. influenzae sequence has a leader sequence and it is expected that the full- 
length homologous N. meningitidis protein will also have one. This suggests that it is either a 
membrane protein, a secreted protein, or a surface protein and that the protein, or one of its 
epitopes, could be a useful antigen for vaccines or diagnostics. 



Example 3 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 1 1>: 



1 


. .ATTTTGATAT 


ACCTCATCCG 


CAAGAATCTA 


GGTTCGCCCG 


TCTTCTTCTT 


51 


TCAGGAACGC 


CCCGGAAAGG 


ACGGAAAACC 


TTTTAAAATG 


GTCAAATTCC 


101 


GTTCCATGCG 


CGACGGCTTG 


TATTCAGACG 


GCATTCCGCT 


GCCCGACGGA 


151 


GAACGCCTGA 


CACCGTTCGG 


CAAAAAACTG 


CGTGCCGcCA 


GTwTGGACGA 


201 


ACTGCCTGAA 


TTATGGAATA 


TCTTAAAAGG 


CGAGATGAGC 


CTGGTCGGCC 


251 


CCCGCCCGCT 


GCTGATGCAA 


TATCTGCCGC 


TGTACGACAA 


CTTCCAAAAC 


301 


CGCCGCCACG 


AAATGAAACC 


CGGCATTACC 


GGCTGGGCGC 


AGGTCAACGG 


351 


GCGCAACGCg 


CTTTCGTGGG 


ACGAAAAATT 


CGCCTGCGAT 


GTTTGGTATA 


401 


TCGACCACTT 


CAGCCTGTGC 


CTCGACATCA 


AAATCCTACT 


GCTGACGGTT 


451 


AAAAAAGTAT 


TAATCAAGGA 


AGGGATTTCC 


GCACAGGGCG 


AACA . aCCAT 


501 


GCCCCCTTTC 


ACAGGAAAAC 


GCAAACTCGC 


CGTCGTCGGT 


GCGGGCGGAC 


551 


ACGGAAAAGT 


CGTTGCCGAC 


CTTGCCGCCG 


CACTCGGCCG 


GTACAGGGAA 


601 


ATCGTTTTTC 


TGGACGACCG 


CGCACAAGGC 


AGCGTCAACG 


GCTTTTCCGT 


651 


CATCGGCACG 


ACGCTGCTGC 


TTGAAAACAG 


TTTATCGCCC 


GAACAATACG 


701 


ACGTCGCCGT 


CGCCGTCGGC 


AACAACCGCA 


TCCGCCGCCA 


AATCGCCGAA 


751 


AAAGCCGCCG 


CGCTCGGCTT 


CGCCCTGCCC 


GTACTGGTTC 


ATCCGGACGC 


801 


GACCGTCTCG 


CCTTCTGCAA 


CAGTCGGACA 


AGGCAGCGTC 


GTTATGGCGA 


851 


AAGCGGTCG. . 











This corresponds to the amino acid sequence <SEQ ED 12; ORF3>: 



1 . ■ ILIYLI RKNL GSPVFFFQER PGKDGKPFKM VKFRSMRDGL YSDGIPLPDG 
51 ERLTPFGKKL RAASXDELPE LWNILKGEMS LVG PRPLLMQ YLPLYDNFQN 
101 RRHEMKPGIT GWAQVNGRNA LSWDEKFACD VWYIDHFSLC LDIKILLLTV 
151 KKVLIKEGIS AQGEXTMPPF TGKRKLAWG AGGHGKWAD LAAALGRYRE 

201 IVFLDDRAQG SVNGFSVIGT TLLLENSLSP EQYDVAVAVG NNRIRRQIAE 
251 KAAALGFALP VLVHPDATVS PSATVGQGSV VMAKAV. . 

Further sequence analysis revealed the complete nucleotide sequence <SEQ ID 13>: 

1 ATGAGTAAAT TCTTCAAACG CCTGTTTGAC ATTGTTGCCT CCGCCTCGGG 

51 ACTGATTTTC CTCTCGCCAG TATTTTTGAT TTTGATATAC CTCATCCGCA 

101 AGAATCTAGG TTCGCCCGTC TTCTTCTTTC AGGAACGCCC CGGAAAGGAC 

151 GGAAAACCTT TTAAAATGGT CAAATTCCGT TCCATGCGCG ACGCGCTTGA 

2 01 TTCAGACGGC ATTCCGCTGC CCGACGGAGA ACGCCTGACA CCGTTCGGCA 

251 AAAAACTGCG TGCCGCCAGT TTGGACGAAC TGCCTGAATT AT GGAAT AT C 

301 TTAAAAGGCG AGATGAGCCT GGTCGGCCCC CGCCCGCTGC TGATGCAATA 

351 TCTGCCGCTG TACGACAACT TCCAAAACCG CCGCCACGAA ATGAAACCCG 

4 01 GCATTACCGG CTGGGCGCAG GTCAACGGGC GCAACGCGCT TTCGTGGGAC 

4 51 GAAAAATTCG CCTGCGATGT TTGGTATATC GACCACTTCA GCCTGTGCCT 

501 CGACATCAAA ATCCTACTGC TGACGGTTAA AAAAGTATTA ATCAAGGAAG 

551 GGATTTCCGC ACAGGGCGAA GCCACCATGC CCCCTTTCAC AGGAAAACGC 

601 AAACTCGCCG TCGTCGGTGC GGGCGGACAC GGAAAAGTCG TTGCCGACCT 

651 TGCCGCCGCA CTCGGCCGGT ACAGGGAAAT CGTTTTTCTG GACGACCGCG 

701 CACAAGGCAG CGTCAACGGC TTTTCCGTCA TCGGCACGAC GCTGCTGCTT 

751 GAAAACAGTT TATCGCCCGA ACAATACGAC GTCGCCGTCG CCGTCGGCAA 

801 CAACCGCATC CGCCGCCAAA TCGCCGAAAA AGCCGCCGCG CTCGGCTTCG 

851 CCCTGCCCGT TCTGGTTCAT CCGGACGCGA CCGTCTCGCC TTCTGCAACA 
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901 GTCGGACAAG GCAGCGTCGT TATGGCGAAA GCCGTCGTAC AGGCAGGCAG 

951 CGTATTGAAA GACGGCGTGA TTGTGAACAC TGCCGCCACC GTCGATCACG 

1001 ACTGCCTGCT TAACGCTTTC GTCCACATCA GCCCAGGCGC GCACCTGTCG 

1051 GGCAACACGC ATATCGGCGA AGAAAGCTGG ATAGGCACGG GCGCGTGCAG 

1101 CCGCCAGCAG ATCCGTATCG GCAGCCGCGC AACCATTGGA GCGGGCGCAG 

1151 TCGTCGTACG CGACGTTTCA GACGGCATGA CCGTCGCGGG CAATCCGGCA 

1201 AAGCCGCTGC CGCGCAAAAA CCCCGAGACC TCGACAGCAT AA 

This corresponds to the amino acid sequence <SEQ ID 14; ORF3-l>: 



1 MSKFFKRLFD IVASA SGLIF LSPVFLILIY LI RKNLGSPV FFFQERPGKD 

51 GKPFKMVKFR SMRDALDSDG IPLPDGERLT PFGKKLRAAS LDELPELWNI 

101 LKGEMSLVGP RPLLMQYLPL YDNFQNRRHE MKPGITGWAQ VNGRNALSWD 

151 EKFACDVWYI DHFS LCLDIK ILLLTVKKVL I KEGISAQGE ATMPPFTGKR 

201 KLAWGAGGH GKVVADLAAA LGRYREIVFL DDRAQGSVNG FSVIGTTLLL 

251 ENSLSPEQYD VAVAVGNNRI RRQIAEKAAA LGFALPVLVH PDATVSPSAT 

301 VGQGSWMAK AWQAGSVLK DGVIVNTAAT VDHDCLLNAF VHISPGAHLS 

351 GNTHIGEESW IGTGACSRQQ IRIGSRATIG AGAWVRDVS DGMTVAGNPA 

4 01 KPLPRKNPET STA* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF3 shows 93.0% identity over a 286aa overlap with an ORF (ORF3a) from strain A of N. 
meningitidis: 



orf 3 . pep ILIYLI RKNLGSPVFFFQERPGKDGKPFKMVKFR 

I I I I I I I M 1 I I I I I I II I I I I I I I M III I I I I 
orf 3a MSKFFKRLFD IVAS ASGLIFLSPVFLILIYLI RKNLGSPVFFFOERPGKDGKPFKMVKFR 

10 20 30 40 50 60 

40 50 60 70 80 90 

orf 3 . pep SMRDGLYSDGIPLPDGERLTPFGKKLRAASXDELPELWNILKGEMSLVGPRPLLMQYLPL 
I I •• I : I I I I I I I I II I I I II I I I II I I [ I I I I I I I I : i I I : I I I I I I I | ! | | | | | | | 
orf 3a SMHDALDSDGILLPDGERLTPFGKKLRAASLDELPELWNVLKGDMSLVGPRPLLMQYLPL 
70 80 90 100 110 120 

100 110 120 130 140 150 

orf 3 .pep YDNFQNRRHEMKPGITGWAQVMGRNALSWDEKFACDVWYIDHFS LCLDIKILLLTVKKVL 

i I I I I I I I I I I I I I I I I I I II I I I I : I I I I : M M M M I M M I I III M I I I 

orf 3a YDNFQNRRHEMKPGITGWAQVNGRWALSWDERFACDIWYIDHFS LCLDIKILLLTVKKVL 
130 140 150 160 170 180 

160 170 180 190 200 210 

orf 3 .pep IKEGISAQGEXTMPPFTGKRKLAWGAGGHGKWADLAAALGRYREIVFLDDRAQGSVNG 
M I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I : M | | M I 11111111:111111 
orf 3a IKEGISAQGEATMPPFTGKRKLAWGAGGHGKVVAELAAALGTYGEIVFLDDRVQGSVNG 
190 200 210 220 230 240 

220 230 240 250 260 270 

orf 3 . pep FSVIGTTLLLENSLSPEQYDVAVAVGNNRIRRQIAEKAAALGFALPVLVHPDATVSPSAT 

1 I I I I I I I I I I I : I : I I I I I I I | I | | | | | | | | | [ | | | | | | | | : | | | : | | | | |, | 

orf 3a FPVIGTTLLLENSLSPEQFDIAVAVGNNRIRRQIAEKAAALGFALPVLIHPDSTVSPSAT 
250 260 270 280 290 300 

280 

orf 3. pep VGQGSVVMAKAV 
1111:1111111 

orf 3a VGQGGWMAKAWQADSVLKDGVIVNTAATVDHDCLLDAFVHISPGAHLSGNTRIGEESW 
310 320 330 340 350 360 

The complete length ORF3a nucleotide sequence <SEQ ID 15> is: 

1 ATGAGTAAAT TCTTCAAACG CCTGTTTGAC ATTGTTGCCT CCGCCTCGGG 
51 ACTGATTTTC CTCTCGCCAG TATTTTTGAT TTTGATATAC CTCATCCGCA 
101 AGAATCTGGG TTCGCCCGTC TTCTTCTTTC AGGAACGCCC CGGAAAGGAC 
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151 GGAAAACCTT TTAAAATGGT CAAATTCCGT TCCATGCACG ACGCGCTTGA 

2 01 TTCAGACGGC ATTCTGCTGC CCGACGGAGA ACGCCTGACA CCGTTCGGCA 

251 AAAAACTGCG TGCCGCCAGT TTGGACGAAC TGCCCGAACT GTGGAACGTC 

301 CTCAAAGGCG ACATGAGCCT GGTCGGCCCC CGCCCGCTGC TGATGCAATA 

351 TCTGCCGCTG TACGACAACT TCCAAAACCG CCGCCACGAA ATGAAACCGG 

401 GCATTACCGG CTGGGCGCAG GTCAACGGGC GCAACGCGCT TTCGTGGGAC 

451 GAACGCTTCG CATGCGACAT CTGGTATATC GACCACTTCA GCCTGTGCCT 

501 CGACATCAAA ATCCTACTGC TGACGGTTAA AAAAGT AT T A ATCAAAGAAG 

551 GGATTTCCGC ACAGGGCGAA GCCACCATGC CCCCTTTCAC AGGAAAACGC 

601 AAACTTGCCG TCGTCGGTGC GGGCGGACAC GGCAAAGTCG TTGCCGAGCT 

651 TGCCGCCGCA CTCGGCACAT ACGGCGAAAT CGTTTTTCTG GACGACCGCG 

701 TCCAAGGCAG CGTCAACGGC TTCCCCGTCA TCGGCACGAC GCTGCTGCTT 

751 GAAAACAGTT TATCGCCCGA ACAATTCGAC ATCGCCGTCG CCGTCGGCAA 

801 CAACCGCATC CGCCGCCAAA TCGCCGAAAA AGCCGCCGCG CTCGGCTTCG 

851 CCCTGCCCGT CCTGATTCAT CCGGACTCGA CCGTCTCGCC TTCTGCAACA 

901 GTCGGACAAG GCGGCGTCGT TATGGCGAAA GCCGTCGTAC AGGCTGACAG 

951 CGTATTGAAA GACGGCGTAA TTGTGAACAC TGCCGCCACC GTCGATCACG 

1001 ATTGCCTGCT TGATGCTTTC GTCCACATCA GCCCGGGCGC GCACCTGTCG 

1051 GGCAACACGC GT AT CGGCGA AGAAAGCTGG ATAGGCACAG GCGCGTGCAG 

1101 CCGCCAGCAG ATCCGTATCG GCAGCCGCGC AACCATTGGA GCGGGCGCAG 

1151 TCGTCGTGCG CGACGTTTCA GACGGCATGA CCGTCGCGGG CAACCCGGCA 

1201 AAACCATTGG CAGGCAAAAA TACCGAGACC CTGCGGTCGT AA 

This is predicted to encode a protein having amino acid sequence <SEQ ID 16>: 

1 MSKFFKRLFD IVAS ASGLIF LSPVFLILIY LI RKNLGSPV FFFQERPGKD 

51 GKPFKMVKFR SMHDALDSDG ILLPDGERLT PFGKKLRAAS LDELPELWNV 

101 LKGDMSLVGP RPLLMQYLPL YDNFQNRRHE MKPGITGWAQ VNGRNALSWD 

151 ERFACDIWYI DHFS LCLDIK ILLLTVKKVL I KEGISAQGE ATM P P FT GKR 

201 KLAVVGAGGH GKWAELAAA LGTYGEIVFL DDRVQGSVNG FPVIGTTLLL 

251 ENSLSPEQFD I A VAVGNNR I RRQIAEKAAA LGFALPVLIH PDSTVSPSAT 

301 VGQGGVVMAK AWQADSVLK DGVIVNTAAT VDHDCLLDAF VHISPGAHLS 

351 GNTRIGEESW IGTGACSRQQ IRIGSRATIG AGAVWRDVS DGMTVAGNPA 

4 01 KPLAGKNTET LRS* 

Two transmembrane domains are underlined. 



ORF3-1 shows 94.6% identity in 410 aa overlap with ORF3a: 



MSKFFKRLFD I VASASGLIFLSPVFLILIYLIRKNLGSPVFFFQERPGKDGKPFKMVKFR 
I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I i I I I I I M 1 I I I I 
MSKFFKRLFDIVASASGLIFLSPVFLILIYLIRKNLGSPVFFFQERPGKDGKPFKMVKFR 
10 20 30 40 50 60 

70 80 90 100 110 120 

SMHDALDSDGILLPDGERLTPFGKKLRAASLDELPELWNVLKGDMSLVGPRPLLMQYLPL 
I I : I M I I I II I M M I I I I I I I M M I I I I I M M I I : M I : M II I M M I I I I II I 
SMRDALDSDGIPLPDGERLTPFGKKLRAASLDELPELWNILKGEMSLVGPRPLLMQYLPL 

70 80 90 100 110 120 

130 140 150 160 170 180 

YDNFQNRRHEMKPGITGWAQVNGRNALSWDERFACDIWYIDHFSLCLDIKILLLTVKKVL 

YDNFQNRRHEMKPGITGWAQVNGRNALSWDEKFACDVWYIDHFSLCLDIKILLLTVKKVL 
130 140 150 160 170 180 

190 200 210 220 230 240 

IKEGISAQGEATMPPFTGKRKLAWGAGGHGKWAELAAALGTYGEIVFLDDRVQGSVNG 
I I I I I I I I I I M I I I I I I I I I II I I I I I I I M I I I : I I I I I I I II II II II : I I I I I I 
IKEGI SAQGEATMPPFTGKRKLAWGAGGHGKWADLAAALGRYRE IVFLDDRAQGSVNG 

190 200 210 220 230 240 

250 260 270 280 290 300 

FPVIGTTLLLENSLSPEQFDIAVAVGNNRIRRQIAEKAAALGFALPVLIHPDSTVSPSAT 
I I I I I ! Ill I M I I I M : I : I M I II Ml M M II I M M M I I I I I : I I I : I I I IN I 
FSVIGTTLLLENSLSPEQYDVAVAVGNNRIRRQIAEKAAALGFALPVLVHPDATV3PSAT 

250 260 270 280 290 300 
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310 320 330 340 350 360 

orf3a "DGP vGQGGWMAKAWQADSVLKDGVIVNTAATVDHDCLLDAFVHISPGAHLSGNTRIGEESW 
| | | | : | | | | | I I I I I | I I I I I I I I I I I I I I I I I I I 1 : I M I I I I I I I II I I I : M I I 1 I 
5 orf3-l VGQGSWMAKAWQAGSVLKDGVIVNTAATVDHDCLLNAFVHISPGAHLSGNTHIGEESW 

310 320 330 340 350 360 

370 380 390 400 410 

orf3a pep iGTGACSRQQIRIGSRATIGAGAVWRDVSDGMTVAGNPAKPLAGKNTETLRSX 
10 ' I I I I H II I I I I M I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I H 

orf3 _l IGTGACSRQQIRIGSRATIGAGAVWRDVSDGMTVAGNPAKPLPRKNPETSTAX 
370 380 390 400 410 

Homology with hypothetical protein encoded by yvfc gene (accession Z71928) of B. subtilis 
15 ORF3 and YVFC proteins show 55% aa identity in 170 aa overlap (BLASTp): 

I YLIRKNLG 3 PVFFFQERPGKDGKPFKMVKFRSMRDGLYS DGI PLPDGERLT PFGKKLRA 62 
I ++R +GSPVFF Q RPG GKPF + KFR+M D S G LPD RLT G+ +R 
IAWRLKIGSPVFFKQVRPGLHGKPFTLYKFRTMTDERDSKGNLLPDEVRLTKTGRLIRK 8 6 

ASXDE LPELWN I LKGEMS LVGPRPLLMQYL PL YDNFQNRRHEMKPGI TGWAQVNGRNAL S 122 

S DELP+L N+LKG++SLVGPRPLLM YLPLY Q RRHE+KPGITGWAQ+NGRNA+S 
LSIDELPQLLNVLKGDLSLVGPRPLLMDYLPLYTEKQARRHEVKPGITGWAQINGRNAIS 14 6 

WDEKFACDVWYIDHFSLCLDXXXXXXXXXXXXXXEGISAQGEXTMPPFTG 172 
W++KF DVWY+D++S LD EGI T FTG 

WEKKFELDVWYVDNWSFFLDLKILCLTVRKVLVSEGIQQTNHVTAERFTG 19 6 

Homology with a predicted ORF from N .gonorrhoeae 

ORF3 shows 86.3% identity over a 286aa overlap with a predicted ORF (ORF3.ng) from N. 
30 gonorrhoeae: 



ORF3 


3 




27 


ORF3 


63 




87 


ORF3 


123 




147 



orf3 

orf3ng 

orf3 

orf3 



ILIYLI RKNLGSPVFFFQERPGKDGKPFKMVKFR 34 
: I I I I I I I I I I I I I I : : I I I I I I I I I I I I I I ! I 
MSKAVKRLFDIIAS ASGLIVLSPVFLVLIYLI RKNKGSPVFFIRERPGKDGKPFKMVKFR 60 

SMRDGLYSDGIPLPDGERLTPFGKKLRAASXDELPELWNILKGEMSLVGPRPLLMQYLPL 94 
I I I I : I 11111111:1111 I I II I I I : I I I I I I I I I : I I ! I II I I I I I I I I I I I I I I 
SMRDALDSDGIPLPDSERLTDFGKKLRATSLDELPELWNVLKGEMSLVGPRPLLMQYLPL 120 

YDNFQNRRHEMKPGITGWAQVNGRNALSWDEKFACDVWYIDHFSLCLDIKILLLTVKKVL 154 
I : : I II I i I I I I I I II I I I I I I I I I I I I I I I I I : I I I I I I : I I : I 1 : I I I : I I I I I I 1 
orf 3ng YNKFQNRRHEMKPGITGWAQVNGRNALSWDEKFSCDVWYTDNFSFWLDMKILFLTVKKVL 180 

orf 3 I KE G I S AQGEXTMP P FT GKRKLAWGAGGHGKVVADLAAALGRYRE IV FLDDRAQG S VNG 214 

M I I II I I I I I I I I I : I : I I I I I : I II I I I II I I : I I I I I I I I M M II I : I I II I I 
orf 3ng IKEGISAQGEATMPPFAGNRKLAVIGAGGHGKWAELAAALGTYGEIVFLDDRTQGSVNG 2 40 

orf 3 FSVIGTTLLLENSLSPEQYDVAVAVGNNRIRRQIAEKAAALGFALPVLVHPDATVSPSAT 274 

I I I I I I I I I I I M I I I I : I :: I I I I I I I I I I I I : I : I I I I I I I I I I : I II II M II I 
O r f 3 ng FPVI GT TLLLEN SLSPEQFDI TVAVGNNR I RRQ I TEN AAALGFKL PVL I H PD AT VS P S AI 3 00 

orf 3 VGQGSWMAKAV 28 6 

: I I I I I I I I I I I 

orf 3ng IGQGSVVMAKAWQAGSVLKDGVIVNTAATVDHDCLLDAFVHISPGAHLSGNTRIGEESR 360 

The complete length ORF3ng nucleotide sequence <SEQ ID 17> is: 

1 AT GAG T AAAG CCGTCAAACG CCTGTTCGAC ATCATCGCAT CCGCATCGGG 

51 GCTGATTGTC CTGTCGCCCG TGTTTTTGGT TTTAATATAC CTCATCCGCA 

101 AAAACTTAGG TTCGCCCGTC TTCTTCattC GGGAACGCCc cgGAAAGGAc 

151 ggaaaacCTT TTAAAATGGT CAAATT CCGT TCCAtgcgcg acgcgcttGA 

201 TTCAGACGGC ATTCCGCTGC CCGATAGCGA ACGCCTGACC GATTTCGGCA 

251 AAAAATTACG CGCCACCAGT TTGGACGAAC TTCCTGAATT ATGGAATGTC 

301 CTCAAAGGCG AGATGAGCCT GGTCGGCCCC CGCCCGCTTT TGATGCAGTA 
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351 TCTGCCGCTT TACAACAAAT TTCAAAACCG CCGCCACGAA ATGAAACCGG 

4 01 GCATTACCGG CTGGGCGCAG GTCAACGGGC GCAACGCGCT TTCGTGGGAC 

4 51 GAAAAGTTCT CCTGCGATGT TTGGTACACC GACAATTTCA GCTTTTGGCT 

501 GGAT AT GAAA ATCCTGTTTC TGACAGTCAA AAAAGTCTTG ATTAAAGAAG 

551 GCATTTCGGC GCAAGGGGAA GCCACCATGC CCCCTTTCGC GGGGAATCGC 

601 AAACTCGCCG TTATCGGCGC GGGCGGACAC GGCAAAGTCG TTGCCGAGCT 

651 TGCCGCCGCA CTCGGCACAT ACGGCGAAAT CGTTTTTCTG GACGACCGCA 

7 01 CCCAAGGCAG CGTCAACGGC TTCCCCGTCA TCGGCACGAC GCTGCTGCTT 

7 51 GAAAACAGTT TATCGCCCGA ACAATTCGAC ATCACCGTCG CCGTCGGCAA 

8 01 CAACCGCATC CGCCGCCAAA TCACCGAAAA CGCCGCCGCG CTCGGCTTCA 
851 AACTGCCCGT TCTGATTCAT CCCGACGCGA CCGTCTCGCC TTCTGCAATA 
901 AT CGGACAAG GCAGCGTCGT AATGGCGAAA GCCGTCGTAC AGGCCGGCAG 
951 CGTATTGAAA GACGGCGTGA TTGTGAACAC TGCCGCCACC GTCGATCACG 

1001 ACTGCCTGCT TGACGCTTTC GtccaCATCA GCCCGGGCGC GCACCTGTCG 

1051 GGCAACACGC GTATCGGCGA AGAAAGCCGG ATAGGCACGG GCGCGTGCAG 

1101 CCGCCAGCAG ACAACCGTCG GCAGCGGGGT TACCgccgGT GCAGGGgcGG 

1151 TTATCGTATG CGACATCCCG GACGGCATGA CCGTCGCGGG CAACCCGGCA 

12 01 AAGCCCCTTA CGGGCAAAAA CCCCAAGACC GGGACGGCAT AA 

This encodes a protein having amino acid sequence <SEQ ID 18>: 

1 MSKAVKRLFD IIASA SGLIV LSPVFLVLIY LI RKNLGSPV FFI RERPGKD 

51 GKPFKMVKFR SMRDALDSDG IPLPDSERLT DFGKKLRATS LDELPELWNV 

101 LKGEMSLVGP RPLLMQYLPL YNKFQNRRHE MKPGITGWAQ VNGRNALSWD 

151 EKFSCDVWYT DNFSFWLDMK ILFLTVKKVL IKEGISAQGE ATMPPFAGNR 

201 KLAVIGAGGH GKWAELAAA LGTYGEIVFL DDRTQGSVNG FPVIGTTLLL 

251 ENSLSPEQFD ITVAVGNNRI RRQITENAAA LGFKLPVLIH PDATVSPSAI 

301 IGQGSVVMAK AVVQAGSVLK DGVIVNTAAT VDHDCLLDAF VHISPGAHLS 

351 GNTRIGEESR IGTGACSRQQ TT VGSGVTAG AGAVIVCDI P DGMTVAGNPA 

4 01 KPLTGKNPKT GTA* 

This protein shows 86.9% identity in 413 aa overlap with ORF3-1 : 

10 20 30 40 50 60 

orf 3-l.pep MSKFFKRLFDIVASASGLIFLSPVFLIL1YLIRKNLGSPVFFFQERPGKDGKPFKMVKFR 
Ml I I I I II : I I I I I II I I I I I I : I I I I I I I I I I I I I I I : : I I I I I I I I I I II I I I I 
orf 3ng MSKAVKRLFDIIASASGLIVLSPVFLVLIYLIRKNLGSPVFFIRERPGKDGKPFKMVKFR 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 3-1. pep SMRDALDSDGIPLPDGERLTPFGKKLRAASLDELPELWNILKGEMSLVGPRPLLMQYLPL 

I I I I I I I I I I I I I I I : II I I I I I I I I I : I I I I I I I I I I : MINIMUM 

orf3ng SMRDALDSDGIPLPDSERLTDFGKKLRATSLDELPELWNVLKGEMSLVGPRPLLMQYLPL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 3-1. pep YDNFQNRRHEMKPGITGWAQVNGRNALSWDEKFACDVWYIDHFSLCLDIKILLLTVKKVL 



190 200 210 220 230 240 

I KEG I SAQGEATMPPFTGKRKLAVVGAGGHGKWADLAAALGRYRE IVFLDDRAQGSVNG 
I I I I I M M I M I I I I : I : I I I I I : I I I 1 I I II I I : I I I I I I I I I II II I I : M I M I 
IKEGISAQGEATMPPFAGNRKLAVIGAGGHGKWAELAAALGTYGEIVFLDDRTQGSVNG 

190 200 210 220 230 240 

250 260 270 280 290 300 

FSVIGTTLLLENSLSPEQYDVAVAVGNNRIRRQIAEKAAALGFALPVLVHPDATVSPSAT 
I I I I I I I I I I II I II I I : I : : II II I I I II II I : I : M M II II I I : II II I M I M 
FPVIGTTLLLENSLSPEQFDITVAVGNNRIRRQITENAAALGFKLPVLIHPDATVSPSAI 

250 260 270 280 290 300 

310 320 330 340 350 360 

VGQGSWMAKAWQAGSVLKDGVIVNTAATVDHDCLLNAFVHISPGAHLSGNTHIGEESW 
: I I I I I I I I I I I I I I M I I I I I II I I I I I II II I I I I : II II I I I II II I I I I : I 1 I M 
IGQGSVVMAKAWQAGSVLKDGVIVNTAATVDHDCLLDAFVHISPGAHLSGNTRIGEESR 

310 320 330 340 350 360 

370 380 390 400 410 
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Query: 


5 


Sbjct: 


3 


Query: 


65 


Sbjct: 


63 


Query: 


125 


Sbjct: 


123 


Query: 


185 


Sb j ct : 


183 



orf3-l pep IGTGACSRQQIRIGSRATIGAGAVWRDVSDGMTVAGNPAKPLPRKNPETSTAX 

M | 1 I : | I : I I I 11 I : I I : I I I I I I I I I I I M I I I = I = I I I 

nrf o na IGTGACSRQQTTVGSGVTAGAGAVIVCDIPDGMTVAGNPAKPLTGKNPKTGTAX 
y 370 380 390 400 410 

In addition, ORF3ng shows significant homology with a hypothetical protein from B.subtilis: 

gnl|PID|e238668 (Z71928) hypothetical protein [Bacillus subtilis] 

>gi|1945702|gnl|PID|e313004 (Z94043) hypothetical protein [Bacillus subtilis] 
>gi|2635938ignl|PID|ell86113 (Z99121) similar to capsular polysaccharide 
biosynthesis [Bacillus subtilis] Length = 202 

Score = 235 bits (594), Expect = 3e-61 

Identities = 114/195 (58%), Positives = 142/195 (72%) 

VKRLFDIIASASGLIVLSPVFLVLIYLIRKNLGSPVFFIRERPGKDGKPFKMVKFRSMRD 64 
+KRLFD+ A+ L S + L I ++R +GSPVFF + RPG GKPF + KFR+M D 
LKRLFDLTAAIFLLCCTSVIILFTIAWRLKIGSPVFFKQVRPGLHGKPFTLYKFRTMTD 62 

ALDSDGIPLPDSERLTDFGKKLRATSLDELPELWNVLKGEMSLVGPRPLLMQYLPLYNKF 124 
DS G LPD RLT G+ +R S+DELP+L NVLKG++SLVGPRPLLM YLPLY + 



Q RRHE+KPGITGWAQ+NGRNA+SW++KF DVWY DN+SF+LD+KIL LTV+KVL+ EG 



The hypothetical product of yvfc gene shows similarity to EXOY of R.meliloti, an 
exopolysaccharide production protein. Based on this and on the two predicted transmembrane 
regions in the homologous N. gonorrhoeae sequence, it is predicted that these proteins, or their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 4 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 19>: 

1 . . AACCAT AT GG C GAT T GT CAT CGACGAATAC GGCGGCACAT CCGGCTTGGT 

51 CACCTTTGAA GACATCATCG AGCAAATCGT CGGCGAAATC GAAGACGAGT 

101 TTGACGAAGA CGATAGCGCC GACAATATCC ATGCCGTTTC TTCAGACACG 

151 TGGCGCATCC ATGCAGCTAC CGAAAT CGAA GACATCAACA CCTTCTTCGG 

201 CACGGAATAC AGCATCGAAG AAGCCGACAC CATT.GGCGG CCTGGTCATT 

251 CAAGAGTTGG GACATCTGCC CGTGCGCGGC GAAAAAGTCC TTATCGGCGG 

301 TTTGCAGTTC ACCGTCGCAC GCGCCGACAA CCGCCGCCTG CATACGCTGA 

351 TGGCGACCCG CGTGAAGTAA GC ACCGC CGTTTCTGCA 

401 CAGTTTAG 

This corresponds to amino acid sequence <SEQ ED 20; ORF5>: 

1 . .NHMAIVIDEY GGTSGLVTFE DIIEQIVGEI EDEFDEDDSA DNIHAVSSDT 
51 WRIHAATEIE DINTFFGTEY S1EEADTIXR PGHSRVGTSA RARRKSPYRR 
101 FAVHRRTRRQ PPPAYADGDP REVS .... XR RFCTV* 

Further sequence analysis revealed the complete DNA sequence to be <SEQ ID 21>: 

1 ATGGACGGCG CACAACCGAA AACGAATTTT TTTGAACGCC TGATTGCCCG 

51 ACTCGCCCGC GAACCCGATT CCGCCGAAGA CGTATTAAAC CTGCTTCGGC 

101 AGGCGCACGA GCAGGAAGTT TTTGATGCGG ATACGCTTTT AAGATTGGAA 

151 AAAGTCCTCG ATTTTTCCGA TTTGGAAGTG CGCGACGCGA TGATTACGCG 

201 CAGCCGTATG AACGTTTTAA AAGAAAACGA CAGCATCGAG CGCATCACCG 

251 CCTACGTTAT CGATACCGCC CATTCGCGCT TCCCCGTCAT CGGCGAAGAC 
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301 AAAGACGAAG TTTTGGGCAT TTTGCACGCC AAAGACCTGC T CAAAT ATAT 

351 GTTTAACCCC GAGCAGTTCC ACCTCAAATC CATTCTCCGC CCCGCCGTCT 

401 TCGTCCCCGA AGGCAAATCG CTGACCGCCC TTTTAAAAGA GTTCCGCGAA 

451 CAGCGCAACC ATATGGCGAT TGTCATCGAC GAATACGGCG GCACATCCGG 

5 501 CTTGGTCACC TTTGAAGACA TCATCGAGCA AATCGTCGGC GAAATCGAAG 

551 ACGAGTTTGA CGAAGACGAT AGCGCCGACA ATATCCATGC CGTTTCTTCC 

601 GAACGCTGGC GCATCCATGC AGCTACCGAA ATCGAAGACA TCAACACCTT 

651 CTTCGGCACG GAATACAGCA GCGAAGAAGC CGACACCATT CGGCCTGGTC 

701 ATTCAAGAGT TGGGACATCT GCCCGTGCGC GGCGAAAAAG TCCTTATCGG 

1Q 751 CGGTTTGCAG TTCACCGTCG CACGCGCCGA CAACCGCCGC CTGCATACGC 

8 01 TGATGGCGAC CCGCGTGAAG TAAGCACCGC CGTTTCTGCA CAGTTTAGGA 

8 51 TGACGGTACG GGCGTTTTCT GTTTCAATCC GCCCCATCCG CCAAACATAA 

This corresponds to amino acid sequence <SEQ ID 22; ORF5-l>: 

1 MDGAQPKTNF FERLIARLAR EPDSAEDVLN LLRQAHEQEV FDADTLLRLE 

15 51 KVLDFSDLEV RDAMITRSRM NVLKENDSIE RITAYVIDTA HSRFPVIGED 

101 KDEVLGILHA KDLLKYMFNP EQFHLKSILR PAVFVPEGKS LTALLKEFRE 

151 QRNHMAIVID EYGGTSGLVT FEDIIEQIVG EIEDEFDEDD SADNIHAVSS 

2 01 ERWRIHAATE IEDINTFFGT EYSSEEADTI RPGHSRVGTS ARARRKSPYR 
251 RFAVHRRTRR QPPPAYADGD PREVSTAVSA QFRMTVRAFS VSIRPIRQT* 

20 Further work identified the corresponding gene in strain A of N. meningitidis <SEQ ID 23 >: 

1 ATGGACGGCG CACAACCGAA AACAAATTTT TTNNAACGCC TGATTGCCCG 

51 ACTCGCCCGC GAACCCGATT CCGCCGAAGA CGTATTGACC CTGTTGCGCC 

101 AAGCGCACGA ACAGGAAGTA TTTGATGCGG ATACGCTTTT AAGATTGGAA 

151 AAAGTCCTCG ATTTTTCTGA TTTGGAAGTG CGCGACGCGA TGATTACGCG 

25 2 01 CAGCCGTATG AACGTTTTAA AAGAAAACGA CAGCATCGAA CGCATCACCG 

251 CCTACGTTAT CGATACCGCC CATTCGCGCT TCCCCGTCAT CGGTGAAGAC 

3 01 AAAGACGAAG TTTTGGGTAT TTTGCACGCC AAAGACCTGC T CAAAT AT AT 

3 51 GTTCAACCCC GAGCAGTTCC ACCTCAAATC GATATTGCGC CCTGCCGTCT 

4 01 TCGTCCCCGA AGGCAAATCG CTGACCGCCC TTTTAAAAGA GTTCCGCGAA 
30 4 51 CAGCGCAACC ATATGGCAAT CGTCATCGAC GAATACGGCG GCACGTCGGG 

501 TTTGGTAACT TTTGAAGACA TCATCGAGCA AATCGTCGGC GACATCGAAG 

551 ATGAGTTTGA CGAAGACGAA AGCGCGGACA ACATCCACGC CGTTTCCGCC 

601 GAACGCTGGC GCATCCACGC GGCTACCGAA ATCGAAGACA TCAACGCCTT 

651 TTTCGGCACG GAATACAGCA GCGAAGAAGC CGACACCATC GGCGGCCNTG 

35 7 01 GTCATTCAGG AATTGGNACA CCTGCCCGTG CGCGGCGAAA AAGTCNTTAT 

7 51 CGGCGNNTTG CANTTCACNG TCGCCNGCGC NGACAACCGC CGCCTGCATA 

801 CGCTGATGGC GACCCGCGTG AAGTAAGCTC CGCCGTTTCT GTACAGTTTA 

851 GGATGACGGT ACGGGCGTTT TCTGTTTCAA TCCGCCCCAT CCGCCANACA 

901 TAA 

40 This encodes a protein having amino acid sequence <SEQ ID 24; ORF5a>: 

1 MDGAQPKTNF XXRLIARLAR EPDSAEDVLT LLRQAHEQEV FDADTLLRLE 
51 KVLDFSDLEV RDAMITRSRM NVLKENDSIE RITAYVIDTA HSRFPVIGED 
101 KDEVLGILHA KDLLKYMFNP EQFHLKSILR PAVFVPEGKS LTALLKEFRE 
151 QRNHMAIVID EYGGTSGLVT FEDIIEQIVG DIEDEFDEDE SADNIHAVSA 
45 201 ERWRIHAATE IEDINAFFGT EYSSEEADTI GGXGHSGIGT PARARRKSXY 

251 RRXAXHXRXR XQPPPAYADG DPREVSSAVS VQFRMTVRAF SVSIRPIRXT 
301 * 

The originally-identified partial strain B sequence (ORF5) shows 54.7% identity over a 124aa 
overlap with ORF5a: 

50 10 20 30 

orf5.pep NHMAIVIDEYGGTSGLVTFEDIIEQIVGEI 

I I I I I I I I I I I M I I I M I I I I I I I I I I : I 
orf5a FHLKSILRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDIIEQIVGDI 
130 140 150 160 170 180 

55 

40 50 60 70 80 90 

orf5.pep EDEFDEDDSADNIHAVSSDTWRIHAATEIEDINTFFGTEYSIEEADTIXRPGHSRVGTSA 
I I I I I I I : I I I I I ! I I I : : I I I M I I I I I I I I : I I I I I I I I I I I I I III : II I 
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100 110 120 130 

orf5 pep RARRKSPYRRFAVHRRTRRQPPPAYADGDPREVSXXXXXRRFCTV 

| | | | | | HI | | I : I II M I I I I I I I I i M 
orf5a rarrkSXYRRXAXHXRXRXQPPPAYADGDPREVSSAVSVQFRMTVRAFSVSIRPIRXTX 
250 260 270 280 290 300 

The complete strain B sequence (ORF5-1) and ORF5a show 92.7% identity in 300 aa overlap: 

10 20 30 40 50 60 

orf5a pep MDGAQPKTNFXXRLIARLAREPDSAEDVLTLLRQAHEQEVFDADTLLRLEKVLDFSDLEV 

' P P | | | | | | | | || | | | | | | I I I I M I I I I I : I I I I I 1 I I I I I I I I I I I I I M I I M 

orf 5-1 MDGAQPKTNFFERLIARLAREPDSAEDVLNLLRQAHEQEVFDADTLLRLEKVLDFSDLEV 



20 



70 80 90 100 110 120 

orf5a pep RDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGEDKDEVLGILHAKDLLKYMFNP 
| | | | I | | | | I I I I I II I I I I ! I I I I I I I I I I II II I I I I I II I I I I I I I I I I I I I I I M I 
orf 5-1 RDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGEDKDEVLGILHAKDLLKYMFNP 
70 80 90 100 110 120 

130 140 150 160 170 180 

orf 5a pep EQFHLKSILRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDIIEQIVG 
I I I I I I I I I I M I 1 I I I I I II I I I II I I I I I I i I I I I I I I I I I I I I I I I I I I I I I 1 1 I I I 
orf 5-1 EQFHLKS ILRPAVFVPEGKSLTALLKE FREQRNHMAIVIDEYGGTSGLVT FE DI IEQIVG 

130 140 150 160 170 180 

25 

190 200 210 220 230 240 

orf 5a pep DIEDEFDEDESADNIHAVSAERWRIHAATEIEDINAFFGTEYSSEEADTIGGXGHSGIGT 
: | | I I I I I I : I ] II I I I I I : I I I I I I I I I I I I I I I : I I I I I I I I M M M III : I I 
orf 5-1 EIEDEFDEDDSADNIHAVSSERWRIHAATEIEDINTFFGTEYSSEEADTIRP-GHSRVGT 
30 190 200 210 220 230 

250 260 270 280 290 300 

orf 5a . pep PARARRKSXYRRXAXHXRXRXQPPPAYADGDPREVSSAVSVQFRMTVRAFSVSIRPIRXT 

I I I I I I I III I I 1 : I I I II I I I I I I I I I I I : I I I : I I M I I II I I I 

35 orf 5-1 SARARRKSPYRRFAVHRRTRRQPPPAYADGDPREVSTAVSAQFRMTVRAFSVSIRPIRQT 

240 250 260 270 280 290 

Further work identified the a partial DNA sequence in ^.gonorrhoeae <SEQ ID 25> which encodes 
a protein having amino acid sequence <SEQ ID 26; ORF5ng>: 

1 MDGAQPKTNF FERLIARLAR EPDSAEDVLN LLRQAHEQEV FDADTLTRLE 

40 51 KVLDFAELEV RDAMITRSRM NVLKENDS IE RITAYVIDTA HSRFPVIGED 

101 KDEVLGILHA KDLLKYMFNP EQFHLKSVLR PAVFVPEGKS LTALLKEFRE 

151 QRNHMAIVID EYGGTSGLVT FEDIIEQIVG DIEDEFDEDE SADDIHSVSA 

2 01 ERWRIHAATE IEDINAFFGT EYGSEEADTI RRLGHSGIGT P ARARRK SPY 

2 51 RR FAVHRR PR RQPPPAHADG DPREVSRACP HRRFCTV* 

45 Further analysis revealed the complete gonococcal nucleotide sequence <SEQ ID 27> to be: 



101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 



ATGGACGGCG 
ACTCGCCCGC 
AGGCGCACGA 
AAAGTATTGG 
CAGCCGCATG 
CCTACGTCAT 
AAAGACGAAG 
GTTCAACCCC 
TCGTGCCCGA 
CAGCGCAACC 
TTTGGTCACC 
ACGAGTTTGA 
GAACGCTGGC 
TTTCGGTACG 
GTCATTCAGG 
cggcgGTTTG 
CGCTGATGGC 



CACAACCGAA 
GAACCCGATT 
ACAGGAAGTT 
ACTTTGCCGA 
AACGTATTGA 
CGATACCGCC 
TTTTGGGCAT 
GAGCAGTTCC 
AGGCAAATCT 
ATATGGCAAT 
TTTGAAGACA 
CGAAGACGAA 
GCATCCacgc 
GAatacggca 
AATTGGGACA 
Cagttcaccg 
GACCCGCGTG 



AACAAATTTT 
CCGCCGAAGA 
TTTGATGCCG 
GCTGGAAGTG 
AAGAAAACGA 
CATTCGCGCT 
TTTGCACGCC 
ACCTGAAATC 
TTGACCGCCC 
CGTCATCGAC 
T CAT C GAG C A 
AGCGccgacg 
ggctaCCGAA 
gcgaagaagc 
CCTGCCCGTG 
tCGCCCGCGC 
AAGTAAGCAG 



TTTGAACGCC 
CGTATTAAAC 
ACACACT GAC 
CGCGATGCGA 
CAGCATCGAA 
TCCCCGTCAT 
AAAGACCTGC 
CGTCTTGCGC 
TTTTAAAAGA 
GAATACGGCG 
AATCGTCGGT 
acatCCACTC 
ATCGAAGaca 
cgacaccatc 
CGCGGCGAAA 
CGACAACCGC 
AGCCTGCCcg 



TGATTGCCCG 
CTGCTTCGGC 
CCGGCTGGAA 
TGATTACGCG 
CGCATCACCG 
CGGCGAAGAC 
T C AAAT AT AT 
CCTGCCGTTT 
GTTCCGCGAA 
GCACGTCGGG 
GACATCGAAG 
cgTTTccgCC 
TCAACGCCTT 
cggcggctTG 
AAGTCCTTAt 
CGCCTGCACA 
AccgccgttT 
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851 CTGCacAGTT TAGGatgACG gtaCGGTCGT TTTCTGTTTC AATCCGCCCC 
901 ATCCGCCAAA CATAA 

This encodes a protein having amino acid sequence <SEQ ID 28; ORF5ng-l>: 

1 MDGAQPKTNF FERLIARLAR EPDSAEDVLN LLRQAHEQEV FDADTLTRLE 
51 KVLDFAELEV RDAMITRSRM NVLKENDSIE RITAYVIDTA HSRFPVIGED 
101 KDEVLGILHA KDLLKYMFNP EQFHLKSVLR PAVFVPEGKS LTALLKEFRE 
151 QRNHMAI V I D EYGGTSGLVT FEDIIEQIVG DIEDEFDEDE SADDIHSVSA 
201 ERWRIHAATE IEDINAFFGT EYGSEEADTI RRLGHSGIGT PARARRKSPY 
251 RRFAVHRRPR RQPPPAHADG DPREVSRACP TAVSAQFRMT VRSFSVSIRP 
301 IRQT* 

The originally-identified partial strain B sequence (ORF5) shows 83.1% identity over a 135aa 
overlap with the partial gonococcal sequence (ORF5ng): 

orf5 NHMAIVIDEYGGTSGLVTFEDIIEQIVGEI 3 0 

I I I I I I I I I I M I I II I I I I I I I I I I I I : I 
orf5ng FHLKSVLRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVTFEDIIEQIVGDI 182 

orf5 EDEFDEDDSADNIHAVSSDTWRIHAATEIEDINTFFGTEYSIEEADTIXRPGHSRVGTSA 90 

I I I I I I I : I I I : I I : II : : I I I I I I 1 I I II I I : I I I I II : MINI I Ml : I I I 
orf 5ng EDEFDEDESADDIHSV3AERWRIHAATEIEDINAFFGTEYGSEEADTIRRLGHSGIGTPA 242 

orf5 RARRK S P YRR FAVHRRT RRQ P P P AYADGD PRE V S X RRFCTV 131 

I I N I I I I I I I I I I I I M I I I I i : I I I I M I I I I I I I II 

orf5ng RARRKSPYRRFAVHRRPRRQPPPAHADGDPREVSRACPHRRFCTV 287 

The complete strain B and gonococcal sequences (ORF5-1 & ORF5ng-l) show 92.4% identity in 
304 aa overlap: 

10 20 30 40 50 60 

orf 5ng-l . pep MDGAQPKTNFFERLIARLAREPDSAEDVLNLLRQAHEQEVFDADTLTRLEKVLDFAELEV 
I I M I I II I I I I M I I I I I i I I I I I I ( I I I I I M II I I I I I I I I I I I I I I I M I :: I I I 
O r f 5 - 1 MDGAQPKTNFFERL I ARLAREPDSAEDVLNLLRQAHEQEVFDADTLLRLEKVLDFS DLEV 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 5ng-l . pep RDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGEDKDEVLGILHAKDLLKYMFNP 

orf 5-1 RDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGEDKDEVLGI^ 

"70 80 90 100 110 120 

130 140 150 160 170 180 

orf 5ng-l . pep EQFHLKSVLRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGLVT FEDIIEQIVG 
I I I I I I I : I I I I M I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I M I I I I I I I I | | | 
o r f 5 - 1 EQFHLKS I LRPAVFV PEGKS LTALLKE FRE QRNHMAI VI DEYGGT SGLVT FEDIIEQIVG 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 5ng-l . pep DIEDEFDEDESADDIHSVSAERWRIHAATEIEDINAFFGTEYGSEEADTIRRLGHSGIGT 
: M I I I I I I : I I I : I I : I I : I I I I I I I I I I I I I I I : I I I I II : I I I I I I II III : I I 
orf 5-1 EIEDEFDEDDSADNIHAVSSERWRIHAATEIEDINTFFGTEYSSEEADT IRP-GHSRVGT 

190 200 210 220 230 

250 260 270 280 290 300 

orf5ng-l.pep PARARRKSPYRRFAVHRRPRRQPPPAHADGDPREVSRACPTAVSAQFRMT VRSFSVSIRP 

... I 111:111111111 I I I I II I I I I I | : I | | | M I 

orf 5-1 SARARRKSPYRRFAVHRRTRRQPPPAYADGDPREVS TAVSAQFRMTVRAFSVSIRP 

240 250 260 270 280 290 



orf 5ng-l .pep 
orf5-l 
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Computer analysis of these amino acid sequences indicates a putative leader sequence, and 
identified the following homologies: 

Homology with hemolysin homolog TlyC (accession U32716) of H. influenzae 
ORF5 and TlyC proteins show 58% aa identity in 77 aa overlap (BLASTp). 

ORF5 2 HMAIVIDEYGGTSGLVT FEDIIEQIVGEIEDEFDEDDSADNIHAVSSDTWRIHAATEIED 61 

HMAIV+DE+G SGLVT EDI+EQIVG+IEDEFDE++ AD I +S T+ + A T+I+D 
TlyC 166 HMAIWDEFGAVSGLVTIEDILEQIVGDIEDEFDEEEIAD-IRQLSRHTYAVRALTDIDD 224 

ORF5 62 INTFFGTEYSIEEADTI 7 8 

N F T++ EE DTI 
TlyC 225 FNAQFNTDFDDEEVDTI 241 

ORF5ng-l also shows significant homology with TlyC: 

SCORES Initl: 301 Initn: 419 Opt: 668 

Smith-Waterman score: 668; 45.9% identity in 242 aa overlap 

10 20 30 40 50 

orf 5ng-l . pep MDGAQPKTNFFERLIARLAR-EPDSAEDVLNLLRQAHEQEVFDADTLTRLEK 
I | | : | : : | : : | : | :::::: | :::::::: | : | : | 

tlycjiaein MNDEQQNSNQSENTKKPFFQSLFGRFFQGELKNREELVEVIRDSEQNDLIDQNTREMIEG 
10 20 30 40 50 60 

60 70 80 90 100 109 

orf 5ng-l . pep VLDFAELEVRDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGE — DKDEVLGILH 
I : : : I I I : I i I II II:: :::::::: : I : : \\ \ \\ \ \ \ : : | : \ : : : |j || 

tlycjiaein VMEIAELRVRDIMIPRSQIIFIEDQQDLNTCLNTIIESAHSRFPVIADADDRDNIVGILH 
70 80 90 100 110 120 

110 120 130 140 150 160 

orf 5ng-l . pep AKDLLKYMF-NPEQFHLKSVLRPAVFVPEGKSLTALLKEFREQRNHMAIVIDEYGGTSGL 

I I I I I I : : : I I I : I : I I I : I : I I I : 1 : : I I : I I : I I I I I I : I | : I : : | | | 
tlycjiaein AKDLLKFLREDAEVFDLSSLLRPWIVPESKRVDRMLKDFRSERFHMAIWDEFGAVSGL 

130 140 150 160 170 180 

170 180 190 200 210 220 

orf 5ng-l . pep VTFEDIIEQIVGDIEDEFDEDESADDIHSVSAERWRIHAATEIEDINAFFGTEYGSEEAD 
M : 1 1 I : I I I I I I I I I I I I I : I II I : : : | : : : : | : I : I : I I : I : : : | | : | 
tlycjiaein VTIEDILEQIVGDIEDEFDEEEIAD-IRQLSRHTYAVRALTDIDDFNAQFNTDFDDEEVD 
190 200 210 220 230 

230 240 250 260 270 280 

orf 5ng-l . pep TIRRLGHSGIG-TPARARRKSPYRRFAVHRRPRRQPPPAHADGDPREVSRACPTAVSAQF 

II I : : I II: 

tlycjiaein TIGGLIMQTFGYLPKRGEEIILKNLQFKVTSADSRRLIQLRVTVPDEHLAEMNNVDEKSE 
240 250 260 270 280 290 

Homology with a hypothetical secreted protein from E.coli: 

ORF5a shows homology to a hypothetical secreted protein from E.coli: 

sp I P7 7 3 92 | YBEX JSCOLI HYPOTHETICAL 33.3 KD PROTEIN IN CUTE-ASNB INTERGENIC REGION 
>gi | 1778577 (U82598) similar to H. influenzae [Escherichia coli] >gi | 1786879 
(AE000170) f292; This 292 aa ORF is 23% identical (9 gaps) to 272 residues of an 
approx. 440 aa protein YTFL_HAEIN SW : P4 4 717 [Escherichia coli] Length =2 92 

Score = 212 bits (533), Expect = 3e-54 

Identities = 112/230 (48%), Positives = 149/230 (64%), Gaps = 3/230 (1%) 

Query: 2 DGAQPKTNFXXRLIARLAR-EPDSAEDVLTLLRQAHEQEVFDADTLLRLEKVLDFSDLEV 60 

D K F L+++L EP + +++L L+R + + ++ D DT LE V+D +D V 

Sbjct: 10 DTISNKKGFFSLLLSQLFHGE PKNRDELLALIRDSGQNDLIDEDTRDMLEGVMDIADQRV 69 

Query: 61 RDAMITRSRMNVLKENDSIERITAYVIDTAHSRFPVIGEDKDEVLGILHAKDLLKYM-FN 119 
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Query: 



Sbjct 



Sbjct 




Query 180 GDIEDEFDEDESADNIHAVSAERWRIHAATEIEDINAFFGTEYSSEEADT 229 

G+IEDE+DE++ D +S W + A IED N FGT +S EE DT 
Sbjct: 190 GEIEDEYDEEDDID-FRQLSRHTWTVRALASIEDFNEAFGTHFSDEEVDT 238 

Based on this analysis, including the amino acid homology to the TlyC hemolysin-homologue from 
H. influenzae (hemolysins are secreted proteins), it was predicted that the proteins from 
AT meningitidis and N. gonorrhoeae are secreted and could thus be useful antigens for vaccines or 



ORF5-1 (30.7kDa) was cloned in the pGex vector and expressed in E.coli, as described above. The 
products of protein expression and purification were analyzed by SDS-PAGE. Figure 2 A shows 
the results of affinity purification of the GST-fusion protein. Purified GST-fusion protein was used 
to immunise mice, whose sera were used for Western blot analysis (Figure IB). These experiments 
confirm that ORF5-1 is a surface-exposed protein, and that it is a useful immunogen. 

Example 5 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 29>: 



1 ATGCGCGGCG GCAGGCCGGA TTCCGTTACC GTGCAGATTA TCGAAGGTTC 

51 GCGTTTTTCG CATATGAGGA AAGTCATCGA CGCAACGCCC GACATCGGAC 

101 ACGACACCAA AGGCTGGAGC AATGAAAAAC TGATGGCGGA AGTTGCGCCC 

151 GATGCCTTCA GCGGCAATCC TGAAgGGCAG TTTTTCCCCG ACAGCTACGA 

201 AATCGATGCG GGCGGCAGTG ATTTGCAGAT TTACCAAACC GCCTACAAgG 

251 GCGATGCAAC GCCGCCTGAA TGAcjGGCATG GGAAAGCAGG CAGGACGGGC 

301 TGCCTTATAA AAACCCTTAT GAAATGCTGA TTATGGCGAr CCTGGTCGAA 

351 AAGGAAACAG GGCATGAAGC CGAsCsCGAC CATGTcGCTT CCGTCTTCGT 

4 01 CAACCGCCTG AAAATCGGTA TGCGCCTGCA AACCgAssCG TCCGTGATTT 

4 51 ACGGCATGGG TGCGGCATAC AAGGGCAAAA TCCGTAAAGC CGACCTGCGC 

501 CGCGACACGC CGTACAACAC CTACACGCGC GGCGGTCTGC CGCCAACCCC 

551 GATTGCGCTG CCC. 



Further sequence analysis revealed the complete DNA sequence <SEQ ID 31>: 



1 ATGTTGAGAA AATTGTTGAA ATGGTCTGCC GTTTTTTTGA CCGTGTCGGC 

51 AGCCGTTTTC GCCGCGCTGC TTTTTGTTCC TAAGGATAAC GGCAGGGCAT 

101 AC CGAAT CAA AATTGCCAAA AACCAGGGTA TTTCGTCGGT CGGCAGGAAA 

151 CTTGCCGAAG ACCGCATCGT GTTCAGCAGG CATGTTTTGA CGGCGGCGGC 

2 01 CTACGTTTTG GGTGTGCACA ACAGGCTGCA TACGGGGACG TACAGATTGC 

2 51 CTTCGGAAGT GTCTGCTTGG GATATCTTGC AGAAAATGCG CGGCGGCAGG 

3 01 CCGGATTCCG TTACCGTGCA GATTATCGAA GGTTCGCGTT TTTCGCATAT 
351 GAGGAAAGTC ATCGACGCAA CGCCCGACAT CGGACACGAC ACCAAAGGCT 

4 01 GGAGCAATGA AAAACTGATG GCGGAAGTTG CGCCCGATGC CTTCAGCGGC 
451 AATCCTGAAG GGCAGTTTTT CCCCGACAGC TACGAAATCG ATGCGGGCGG 



diagnostics. 



This corresponds to the amino acid sequence <SEQ ED 30; ORF7>: 



1 MRGGRPDSVT VQIIEGSRFS HMRKVI DAT P DIGHDTKGWS NEKLMAEVAP 

51 DAFSGNPEGQ FFPDSYEIDA GGSDLQIYQT AYKAMQRRLN EAWESRQDGL 

101 PYKNPYEMLI MAXLVEKETG HEAXXDHVAS VFVNRLKIGM RLQTXXSVIY 

151 GMGAAYKGKI RKADLRRDTP YNTYTRGGLP PTPIALP. . 
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501 CAGTGATTTG CAGATTTACC AAACCGCCTA CAAGGCGATG CAACGCCGCC 

551 TGAATGAGGC ATGGGAAAGC AGGCAGGACG GGCTGCCTTA TAAAAACCCT 

601 TATGAAATGC TGATTATGGC GAGCCTGGTC GAAAAGGAAA CAGGGCATGA 

651 AGCCGACCGC GACCATGTCG CTTCCGTCTT CGTCAACCGC CTGAAAATCG 

701 GTATGCGCCT GCAAACCGAC CCGTCCGTGA TTTACGGCAT GGGTGCGGCA 

751 T ACAAGGG C A AAATCCGTAA AGCCGACCTG CGCCGCGACA CGCCGTACAA 

801 CACCTACACG CGCGGCGGTC TGCCGCCAAC CCCGATTGCG CTGCCCGGCA 

851 AGGCGGCACT CGATGCCGCC GCCCATCCGT CCGGCGAAAA ATACCTGTAT 

901 TTCGTGTCCA AAATGGACGG CACGGGCTTG AGCCAGTTCA GCCATGATTT 

951 GACCGAACAC AATGCCGCCG TCCGCAAATA TATTTTGAAA AAATAA 

This corresponds to the amino acid sequence <SEQ ID 32; ORF7-l>: 

1 MLRKLLKWSA VFLTVSAAVF A ALLFVPKDN GRAYRIKIAK NQGISSVGRK 

51 LAEDRIVFSR HVLTAAAYVL GVHNRLHTGT YRLPSEVSAW DILQKMRGGR 

101 PDSVTVQIIE GSRFSHMRKV IDATPDIGHD TKGWSNEKLM AEVAPDAFSG 

151 NPEGQFFPDS YEIDAGGSDL QIYQTAYKAM QRRLNEAWES RQDGLPYKNP 

201 YEMLIMASLV EKETGHEADR DHVASVFVNR LKIGMRLQTD PSVIYGMGAA 

251 YKGKIRKADL RRDTPYNTYT RGGLPPTPIA LPGKAALDAA AHPSGEKYLY 

301 FVSKMDGTGL SQFSHDLTEH NAAVRKYILK K* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with hypothetical protein encoded by yegg Rene (accession P442701 of H.influenzae 

ORF7 and yceg proteins show 44% aa identity in 192 aa overlap: 

ORF7 1 MRGGRPDSVTVQIIEGSRFSHMRKVIDATPDIGHDTKGWSNEKLMA EVAPDAFSG 55 

+ G+ V+ IEG F RK ++ P + K SNE++ A ++ + 

yceg 102 LNSGKEVQFNVKWIEGKTFKDWRKDLENAPHLVQTLKDKSNEEIFALLDLPDIGQNLELK 161 

ORF7 56 NPEGQFFPDSYEIDAGGSDLQIYQTAYKAMQRRLNEAWESRQDGLPYKNPYEML1MAXLV 115 

N EG +PD+Y +DL++ + + + M++ LN+AW R + LP NPYEMLI+A +V 

yceg 162 NVEGWLYPDTYNYTPKSTDLELLKRSAERMKKALNKAWNERDEDLPLANPYEMLILASIV 221 

ORF7 116 EKETGHEAXXDHVASVFVNRLKIGMRLQTXXSVIYGMGAAYKGKIRKADLRRDTPYNTYT 175 

EKETG VASVF+NRLK M+LQT +VIYGMG Y G IRK DL TPYNTY 

yceg 222 EKETGIANERAKVASVFINRLKAKMKLQTDPTVIYGMGENYNGNIRKKDLETKTPYNTYV 281 

ORF7 17 6 RGGLPPTPIALP 187 

GLPPTPIA+P 
yceg 282 IDGLPPTPIAMP 293 

The complete length YCEG protein has sequence: 

1 MKKFLIAILL LILILAGVAS FS YYKMTEFV KTPVNVQADE LLTIERGTTS 

51 SKLATLFEQE KLIADGKLLP YLLKLKPELN KIKAGTYSLE NVKTVQDLLD 

101 LLNSGKEVQF NVKWIEGKTF KDWRKDLENA PHLVQTLKDK SNEE I FALLD 

151 LPDIGQNLEL KNVEGWLYPD TYNYTPKSTD LELLKRSAER MKKALNKAWN 

201 ERDEDLPLAN PYEMLILASI VEKETGIANE RAKVASVFIN RLKAKMKLQT 

251 DPTVIYGMGE NYNGNIRKKD LETKTPYNTY VIDGLPPTPI AMPSESSLQA 

301 VANPEKTDFY YFVADGSGGH KFTRNLNEHN KAVQEYLRWY RSQKNAK 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF7 shows 95.2% identity over a 187aa overlap with an ORF (ORF7a) from strain A of TV. 
meningitidis: 

10 20 30 

orf7 .pep MRGGRPDSVTVQIIEGSRFSHMRKVIDATP 

I I I I I I I I M I I I I I I I I I I I I I I M I I I I 
orf7a AAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPDSVTVQIIEGSRFSHMRKVIDATP 
70 80 90 100 110 120 

40 50 60 70 80 90 

orf 7 . pep DIGHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLQIYQTAYKAMQRRLN 
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100 110 120 130 140 150 

EAWESRQDGLPYKNPYEMLIMAXLVEKETGHEAXXDHVASVFVNRLKIGMRLQTXXSVIY 
I I I I I I I I I I I I I M I I I I I ! I I : I I I I I II I 1 I I I I I I I I I I I I I I M II I I I I 

EAWESRQDGLPYKNPYEMLIMASLIEKETGHEADRDHVASVFVNRLKIGMRLQTDPSVIY 
190 200 210 220 230 240 

160 170 180 

GMGAAYKGKIRKADLRRDTPYNTYTRGGLPPTPIALP 
I I I I I I I I II II II II I II I I I I I i I I I M I I I M I I 

GMGAAYKGKIRKADLRRDTPYNTYTRGGLPPTPIALPGKAALDAAAHPSGEKYLYFVSKM 
250 260 270 280 290 300 



The complete length ORF7a nucleotide sequence <SEQ ID 33> is: 

1 ATGTTGAGAA AATTGTTGAA ATGGTCTGCC GTTTTTTTGA CCGTATCGGC 

51 AGCCGTTTTC GCCGCGCTGC TTTTCGTCCC TAAAGACAAC GGCAGGGCAT 

101 ACAGGATTAA AATTGCCAAA AACCAGGGTA TTTCGTCGGT CGGCAGGAAA 

151 CTTGCCGAAG ACCGCATCGT GTTCAGCAGG CATGTTTTGA CGGCGGCGGC 

201 CTACGTTTTG GGTGTGCACA ACAGGCTGCA TACGGGGACG TACAGACTGC 

251 CTTCGGAAGT GTCTGCTTGG GATATCTTGC AGAAAATGCG CGGCGGCAGG 

301 CCGGATTCCG TTACCGTGCA GATTATCGAA GGTTCGCGTT TTTCGCATAT 

351 GAGGAAAGTC ATCGACGCAA CGCCCGACAT CGAACACGAC ACCAAAGGCT 

4 01 GGAGCAATGA AAAACTGATG GCGGAAGTTG CCCCTGATGC CTTCAGCGGC 

451 AATCCTGAAG GGCAGTTTTT CCCCGACAGC TACGAAATCG ATGCGGGCGG 

501 CAGCGATTTA CGGATTTACC AAATCGCCTA CAAGGCGATG CAACGCCGAC 

551 TGAATGAGGC ATGGGAAAGC AGGCAGGACG GGCTGCCTTA TAAAAACCCT 

601 TATGAAATGC TGATTATGGC GAGCCTGATC GAAAAGGAAA CAGGGCATGA 

651 AGCCGACCGC GACCATGTCG CTTCCGTCTT CGTCAACCGC CTGAAAATCG 

701 GTATGCGCCT GCAAACCGAC CCGTCCGTGA TTTACGGCAT GGGTGCGGCA 

7 51 TACAAGGGCA AAATCCGTAA AGCCGACCTG CGCCGCGACA CGCCGTACAA 

8 01 CACCTACACG CGCGGCGGTC TGCCGCCAAC CCCGATCGCG CTGCCCGGCA 
851 AGGCGGCACT CGATGCCGCC GCCCATCCGT CCGGTGAAAA ATACCTGTAT 
901 TTCGTGTCCA AAATGGACGG TACGGGCTTG AGCCAGTTCA GCCATGATTT 
951 GACCGAACAC AACGCCGCCG TTCGCAAATA TATTTTGAAA AAATAA 

This is predicted to encode a protein having amino acid sequence <SEQ ID 34>: 

1 MLRKLLKWSA VFLTVSAAVF A ALLFVPKDN GRAYRIKIAK NQGISSVGRK 

51 LAEDRIVFSR HVLTAAAYVL GVHNRLHTGT YRLPSEVSAW DILQKMRGGR 

101 PDSVTVQIIE GSRFSHMRKV IDATPDIEHD TKGWSNEKLM AEVAPDAFSG 

151 NPEGQFFPDS YEIDAGGSDL RIYQIAYKAM QRRLNE AWE S RQDGLPYKNP 

2 01 YEMLIMASLI EKETGHEADR DHVASVFVNR LKIGMRLQTD PSVIYGMGAA 

251 YKGKIRKADL RRDTPYNTYT RGGLPPTPIA LPGKAALDAA AHPSGEKYLY 

301 FVSKMDGTGL SQFSHDLTEH N AAVRKY ILK K* 



A leader peptide is underlined. 



ORF7a and ORF7-1 show 98.8% identity in 331 aa overlap: 



MLRKLLKWSAVFLTVSAAVFAALLFVPKDNGRAYRIKIAKNQGISSVGRKLAEDRIVFSR 

M I I I I I I I I II I I I I I I I I I I I II I I I I I I I M II 1 I I I I I I I II I I | 

MLRKLLKWSAVFLTVSAAVFAALLFVPKDNGRAYRIKIAKNQGISSVGRKLAEDRIVFSR 
10 20 30 40 50 60 

70 80 90 100 110 120 

HVLTAAAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPDSVTVQIIEGSRFSHMRKV 
I I N M I I I II I M 1 I N I M M I I I I N II I I I I I I 1 I N ill I IN I I I I I I 1 I i I I I 
HVLTAAAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPDSVTVQIIEGSRFSHMRKV 

70 80 90 100 110 120 

130 140 150 160 170 180 
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or f 7 a. pep IDATPDIEHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLRIYQIAYKAM 
I I I I I I I I I I 1 I I I I I I I I I I I I 1 I 1 I I I I I I I I I I I I I 1 I I I I I I I I I : I M I I II I 
orf7-l IDATPDIGHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLQIYQTAYKAM 
130 140 150 160 170 180 



190 200 210 220 230 240 

orf 7 a . pep QRRLNEAWESRQDGLPYKNPYEMLIMASLIEKETGHEADRDHVASVFWRLKIGMRLQTD 
M I I I I I I I N I I I I II I I I I M I I I I I I : II I I II I I I I II M II II II II I I I I II M 
orf 7-1 QRRLNEAWESRQDGLPYKNPYEMLIMASLVEKETGHEADRDHVASVFVNRLKIGMRLQTD 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf 7 a . pep PSVIYGMGAAYKGKIRKADLRRDTPYNTYTRGGLPPTPIALPGKAALDAAAHPSGEKYLY 
I I I II II M I I I I I I II I I I I II I I M I I I I I 1 I II II I I II II II I I I I I I I II II I M 
orf 7-1 PSVIYGMGAAYKGKIRKADLRRDTPYNTYTRGGLPPTPIALPGKAALDAAAHPSGEKYLY 

250 260 270 280 290 300 



310 320 330 

or f 7a . pep FVSKMDGTGLSQFSHDLTEHNAAVRKYILKKX 

orf 7-1 FVSKMDGTGLSQFSHDLTEHNAAVRKYILKKX 
310 320 330 



Homology with a predicted ORF from N. gonorrhoeae 

ORF7 shows 94.7% identity over a 187aa overlap with a predicted ORF (ORF7.ng) from TV. 
gonorrhoeae: 



orf 7 MRGGRPDSVTVQIIEGSRFSHMRKVIDATPDIGHDTKGWSNEKLMAEVAPDAFSGNPEGQ 60 

I I I I I I I I I I I I M I I I I I I I I II M I I I I I I I I || i I II I I I I I II II I M I I 

orf 7ng MRGGRPDSVTVQIIEGSRFSHMRKVIDATPDIGHDTKGWSNEKLMAEVAPDAFSGNPEGQ 60 

or f 7 FFPDSYEIDAGGSDLQIYQTAYKAMQRRLNEAWESRQDGLPYKNPYEMLIMAXLVEKETG 12 0 

I I M I I I I I I I I I I M I I I I I I I I I II I I I I I I : I | | | | | | | | | | | | | | || | : | | | || 
orf7ng FFPDSYEIDAGGSDLQIYQTAYKAMQRRLNEAWAGRQDGLPYKNPYEMLIMASLIEKETG 120 

orf 7 HEAXXDHVASVFVNRLKIGMRLQTXXSVIYGMGAAYKGKIRKADLRRDTPYNTYTRGGLP 180 

M I I I II I I I I I I I I I I I I I I ] II I I I I I I I I I I I I I I I I I I I I I I I 

orf 7ng HEADRDHVASVFVNRLKIGMRLQTDPSVIYGMGAAYKGKIRKADLRRDTPYNTYTGGGLP 180 

orf7 PTPIALP 187 

II II I I 

orf7ng PTRIALPGKAAMDAAAHPSGEKYLYFVSKMDGTGLSQFSHDLTEHNAAVRKYILKK 23 6 

An ORF7ng nucleotide sequence <SEQ ID 35> is predicted to encode a protein having amino acid 
sequence <SEQ ID 36>: 



1 MRGGRPDSVT VQIIEGSRFS HMRKVIDATP DIGHDTKGWS NEKLMAEVAP 

51 DAFSGNPEGQ FFPDSYEIDA GGSDLQIYQT AYKAMQRRLN EAWAGRQDGL 

101 PYKNPYEMLI MASLIEKETG HEADRDHVAS VFVNRLKIGM RLQTDPSVIY 

151 GMGAAYKGKI RKADLRRDTP YNTYTGGGLP PTRIALPGKA AMDAAAHPSG 

2 01 EKYLYFVSKM DGTGLSQFSH DLTEHNAAVR KYILKK* 

Further sequence analysis revealed a partial DNA sequence of ORF7ng <SEQ ID 37>: 

1 . . taccgaatca AGATTGCCAA AAATCAGGGT ATTTCGTCGG TCGGCAGGAA 
51 ACTTGCcgaA GACCGCATCG TGTTCAGCAG GCATGTTTTG ACAGCGGCGG 
101 CCTACGTTTT GGGTGTGCAC AACAGGCTGC ATACGGGGAC gTACAGATTG 
151 CCTTCGGAAG TGTCTGCTTG GGATATCTTG CAGAAAATGC GCGGCGGCAG 
201 GCCGGATTCC GTTACCGTGC AGATTATCGA AGGTTCGCGT TTTTCGCATA 
251 TGAGGAAAGT CATCGACGCA ACGCCCGACA TCGGACACGA CACCAAAGGC 
301 TGGAGCAATG AAAAACTGAT GGCGGAAGTT GCGCCCGATG CCTTCAGCGG 

351 CAATCCTGAA GGGCAGTTTT TTCCCGACAG CTACGAAATC GATGCGGGCG 
4 01 GCAGCGATTT GCAGATTTAC CAAACCGCCT ACAAGGCGAT GCAACGCCGC 
451 CTGAACGAGG CATGGGCAGG CAGGCAGGAC GGGCTGCCTT ATAAAAACCC 
501 TTATGAAATG CTGATTATGG CGAGCCTGAT CGAAAAGGAA ACGGGGCATG 
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551 AGGCCGACCG CGACCATGTC GCTTCCGTCT TCGTCAACCG 

601 GGTATGCGCC TGCAAACCGA CCCGTCCGTG ATTTACGGCA 

651 ATACAAGGGC AAAATCCGTA AAGCCGACCT GCGCCGCGAC 

7 01 aCAccTAtac gggcgggggc ttgccgccaa cccggattgc 

7 51 Aaggcggcaa tggatgccgc cgcccacccg tccggcgaAa 

801 tttcgtgtcC AAAAT GGACG GCACGGGCTT GAGCCAGTTC 

851 TGACCGAACA CAACGCCGCc gTcCGCAAAT ATATTTTGAA 

This corresponds to the amino acid sequence <SEQ ID 38; ORF7ng 



CCTGAAAATC 
TGGGTGCGGC 
ACGCCGTACA 
gctgcccggC 
aatacctgTa 
AGCCATGATT 
AAAATAA 

-1>: 



1 YRIKIAKNQG I S S VGRKLAE DRIVFSRHVL TAAAYVLGVH NRLHTGTYRL 

10 51 PSEVSAWDIL QKMRGGRPDS VTVQIIEGSR FSHMRKVIDA TPDIGHDTKG 

101 WSNEKLMAEV APDAFSGNPE GQFFPDSYEI DAGGSDLQIY QTAYKAMQRR 

151 LNEAWAGRQD GLPYKNPYEM LIMASLIEKE TGHEADRDHV ASVFVNRLKI 

201 GMRLQTDPSV IYGMGAAYKG KIRKADLRRD TPYNTYTGGG LPPTRIALPG 

251 KAAMDAAAHP SGEKYLYFVS KMDGTGLSQF SHDLTEHNAA VRKYILKK* 

1 5 ORF7ng- 1 and ORF7- 1 show 98 .0% identity in 298 aa overlap : 



30 



rf 7-1 .pep 

>rf7ng-l 



KLLKWSAVFLTVSAAVFAALLFVPKDNGRAYRIKIAKNQGISSVGRKLAEDRIVFSRHVL 
I I I I I I I I I I! II I I 11 I I I I I I I ! I I I I I 
YRIKIAKNQGISSVGRKLAEDRIVFSRHVL 



10 



20 



30 



70 80 90 100 110 120 

TAAAYVLGVHNRLHT GTYRLPSEVSAWDILQKMRGGRPDSVTVQIIEGSRFSHMRKVIDA 

I M I I I I I I I I I I I I I I I I I I I I I I I M M I I I I I I I I I I I I I I I I I I I I I I I I I 

TAAAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPDSVTVQIIEGSRFSHMRKVIDA 



40 



50 



60 



70 



90 



130 140 150 160 170 180 

TPDIGHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLQIYQTAYKAMQRR 
| | | | | | | I I I I i [ I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I 
TPDIGHDTKGWSNEKLMAEVAPDAFSGNPEGQFFPDSYEIDAGGSDLQIYQTAYKAMQRR 
100 110 120 130 140 150 

190 200 210 220 230 240 

LNEAWESRQDGLPYKNPYEMLIMASLVEKETGHEADRDHVASVFVNRLKIGMRLQTDPSV 
I I I I I : I I I I I I I I I M I I I I II I I : I M I I I I I II II I I I I I i I II I I 1 I I I I I I II I 
LNEAWAGRQDGLPYKN PYEMLIMASLIEKETGHEADRDHVASVFVNRLKI GMRLQTDPSV 
160 170 180 190 200 210 

250 260 270 280 290 300 

IYGMGAAYKGKIRKADLRRDTPYNTYTRGGLPPTPIALPGKAALDAAAHPSGEKYLYFVS 
I I || II I I I M I I I I I I I I I II I I I I I I I I I I I I I I I II II : I I I I i I I I I I I I I II I 
IYGMGAAYKGKIRKADLRRDTPYNTYTGGGLPPTRIALPGKAAMDAAAHPSGEKYLYFVS 
220 230 240 250 260 270 

310 320 330 

KMDGTGLSQFSHDLTEHNAAVRKYILKKX 



In addition, ORF7ng-l shows significant homology with a hypothetical E.coli protein: 

sp|P28306|YCEG_ECOLI HYPOTHETICAL 38.2 KD PROTEIN IN PABC-HOLB INTERGENIC REGION 
gi I 1787339 (AE000210) o340; 100% identical to fragment YCEG_ECOLI SW: P28306 but 
55 has 97 additional C-terminal residues [Escherichia coli] Length = 340 

Score = 79 (36.2 bits), Expect = 5.0e-57, Sum P(2) = 5.0e-57 
Identities = 20/87 (22%), Positives = 40/87 (45%) 

Query: 10 GISSVGRKLAEDRIVFSRHVLTAAAYVLGVHNRLHTGTYRLPSEVSAWDILQKMRGGRPD 69 

60 G ++G +L D+I+ V + + GTYR +++ ++L+ + G+ 

Sbjct: 49 GRLALGEQLYADKIINRPRVFQWLLRIEPDLSHFKAGTYRFTPQMTVREMLKLLESGKEA 108 



Query: 



7 0 SVTVQIIEGSRFSHMRKVIDATPDIGH 96 
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++++EG R S K + P I H 
Sbjct: 109 QFPLRLVEGMRLSDYLKQLREAPYIKH 135 

Score = 438 (200.7 bits), Expect = 5.0e-57, Sum P(2) = 5.0e-57 
Identities = 84/155 (54%), Positives = 111/155 (71%) 



Query: 


120 


EGQFFPDSYEIDAGGSDLQIYQTAYKAMQRRLNEAWAGRQDGLPYKNPYEMLIMASLIEK 


179 






EG F+PD++ A +D+ + + A+K M + ++ AW GR DGLPYK+ +++ MAS+IEK 




Sbjct: 


158 


EGWFWPDTWMYTANTTDVALLKRAHKKMVKAVDSAWEGRADGLPYKDKNQLVTMASIIEK 


217 


Query: 


180 


ETGHEADRDHVASVFWRLKIGMRLQTDPSVIYGMGAAYKGKIRKADLRRDTPYNTYTGG 


239 






ET ++RD VAS VF+NRL+ IGMRLQTD P+ VI YGMG Y GK+ +ADL T YNTYT 




Sbjct: 


218 


ETAVASERDKVASVFINRLRIGMRLQTDPTVIYGMGERYNGKLSRADLETPTAYNTYTIT 


277 




240 


GLPPTRIALPGKAAMDAAAHPSGEKYLYFVSKMDG 27 4 








GLPP IA PG ++ AAAHP+ YLYFV+ G 




Sbjct: 


278 


GLPPGAIATPGADSLKAAAHPAKTPYLYFVADGKG 312 





Based on this analysis, including the fact that the H.influenzae YCEG protein possesses a possible 
leader sequence, it is predicted that the proteins from N meningitidis and N. gonorrhoeae, and their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 6 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 39>: 

1 CGTTTCAAAA TGTTAACTGT GTTGACGGCA ACCTTGATTG CCGGACAGGT 

51 ATCTGCCGCC GGAGGCGGTG CGGGGGATAT GAAACAGCCG AAGGAAGTCG 

101 GAAAGGTTTT CAGAAAGCAG CAGCGTTACA GCGAGGAAGA AATCAAAAAC 

151 GAACGCGCAC GGCTTGCGGC AGTGGGCGAG CGGGTTAATC AG AT AT T T AC 

201 GTTGCTGGGA GGGGAAACCG CCTTGCAAAA GGGGCAGGCG GGAACGGCTC 

251 TGGCAACCTA TATGCTGATG TTGGAACGCA CAAAATCCCC CGAAGTCGCC 

3 01 GAACGCGCCT TGGAAATGGC CGTGTCGCTG AACGCGTTTG AACAGGCGGA 
351 AATGATTTAT CAGAAATGGC GGCAGATTGA GCCTATACCG GGTAAGGCGC 
401 AAAAACGGGC GGGGTGGCTG CGGAACGTGC TGAGGGAAAG AGGAAATCAG 

4 51 CATCTGGACG GACGGGAAGA AGTGCTGGCT CAGGCGGACG AAGGACAG 

This corresponds to the amino acid sequence <SEQ ID 40; ORF9>: 

1 ■ ■ RFKMLTVLTA TLIAGQVSAA GGGAGDMKQP KEVGKVFRKQ QRYSEEEIKN 
51 ERARLAAVGE RVNQIFTLLG GETALQKGQA GTALATYMLM LERTKSPEVA 
101 ERALEMAVSL NAFEQAEMIY QKWRQIEPIP GKAQKRAGWL RNVLRERGNQ 

151 HLDGREEVLA QADEGQ 

Further sequence analysis revealed the complete DNA sequence <SEQ ID 41>: 

1 ATGTTACCTA ACCGTTTCAA AATGTTAACT GTGTTGACGG CAACCTTGAT 

51 TGCCGGACAG GTATCTGCCG CCGGAGGCGG TGCGGGGGAT ATGAAACAGC 

101 CGAAGGAAGT CGGAAAGGTT TTCAGAAAGC AGCAGCGTTA CAGCGAGGAA 

151 GAAATCAAAA ACGAACGCGC ACGGCTTGCG GCAGTGGGCG AGCGGGTTAA 

201 TCAGATATTT ACGTTGCTGG GAGGGGAAAC CGCCTTGCAA AAGGGGCAGG 

251 CGGGAACGGC TCTGGCAACC TATATGCTGA TGTTGGAACG CACAAAATCC 

301 CCCGAAGTCG CCGAACGCGC CTTGGAAATG GCCGTGTCGC TGAACGCGTT 

351 TGAACAGGCG GAAATGATTT ATCAGAAATG GCGGCAGATT GAGCCTATAC 

4 01 CGGGTAAGGC GCAAAAACGG GCGGGGTGGC TGCGGAACGT GCTGAGGGAA 

451 AGAGGAAATC AGCATCTGGA CGGACTGGAA GAAGTGCTGG CTCAGGCGGA 

501 CGAAGGACAG AACCGCAGGG TGTTTTTATT GTTGGCACAA GCCGCCGTGC 

551 AACAGGACGG GTTGGCGCAA AAAGCATCGA AAGCGGTTCG CCGCGCGGCG 

601 TTGAAATATG AACATCTGCC CGAAGCGGCG GTTGCCGATG TGGTGTTCAG 

651 CGTACAGGGA CGCGAAAAGG AAAAGGCAAT CGGAGCTTTG CAGCGTTTGG 

701 CGAAGCTCGA TACGGAAATA TTGCCCCCCA CTTTAATGAC GTTGCGTCTG 

751 ACTGCACGCA AATATCCCGA AATACTCGAC GGCTTTTTCG AGCAGACAGA 

801 CACCCAAAAC CTTTCGGCCG TCTGGCAGGA AATGGAAATT ATGAATCTGG 

851 TTTCCCTGCA CAGGCTGGAT GATGCCTATG CGCGTTTGAA CGTGCTGTTG 
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901 GAACGCAATC CGAATGCAGA CCTGTATATT CAGGCAGCGA TATTGGCGGC 

951 AAACCGAAAA GAAGGTGCTT CCGTTATCGA CGGCTACGCC GAAAAGGCAT 

1001 ACGGCAGGGG GACGGAGGAA CAGCGGAGCA GGGCGGCGCT AACGGCGGCG 

1051 ATGATGTATG CCGACCGCAG GGATTACGCC AAAGTCAGGC AGTGGCTGAA 

1101 AAAAGTATCC GCGCCGGAAT ACCTGTTCGA CAAAGGTGTG CTGGCGGCTG 

1151 CGGCGGCTGT CGAGTTGGAC GGCGGCAGGG CGGCTTTGCG GCAGATCGGC 

1201 AGGGTGCGGA AACTTCCCGA ACAGCAGGGG CGGTATTTTA CGGCAGACAA 

1251 TTTGTCCAAA ATACAGATGC TCGCCCTGTC GAAGCTGCCC GATAAACGGG 

1301 AGGCTTTGAG GGGGTTGGAC AAGATTATCG AAAAACCGCC TGCCGGCAGT 

1351 AATACAGAGT TACAGGCAGA GGCATTGGTA CAGCGGTCAG TTGTTTACGA 

1401 TCGGCTTGGC AAGCGGAAAA AAATGATTTC AGATCTTGAA AGGGCGTTCA 

1451 GGCTTGCACC CGATAACGCT CAGATTATGA ATAATCTGGG CTACAGCCTG 

1501 CTGACCGATT CCAAACGTTT GGACGAAGGT TTCGCCCTGC TTCAGACGGC 

1551 ATACCAAATC AACCCGGACG ATACCGCTGT CAACGACAGC ATAGGCTGGG 

1601 CGTATTACCT GAAAGGCGAC GCGGAAAGCG CGCTGCCGTA TCTGCGGTAT 

1651 TCGTTTGAAA ACGACCCCGA GCCCGAAGTT GCCGCCCATT TGGGCGAAGT 

1701 GTTGTGGGCA TTGGGCGAAC GCGATCAGGC GGTTGACGTA TGGACGCAGG 

1751 CGGCACACCT TACGGGAGAC AAGAAAATAT GGCGGGAAAC GCTCAAACGT 

1801 CACGGCATCG CATTGCCCCA ACCTTCCCGA AAACCTCGGA AATAA 

This corresponds to the amino acid sequence <SEQ ID 42; ORF9-l>: 

1 MLPNRFKMLT VLTATLIAGQ VSAAGG GAGD MKQPKEVGKV FRKQQRYSEE 

51 EIKNERARLA AVGERVNQIF TLLGGETALQ KGQAGTALAT YMLMLERTKS 

101 PEVAERALEM AVSLNAFEQA EMIYQKWRQI EPIPGKAQKR AGWLRNVLRE 

151 RGNQHLDGLE EVLAQADEGQ NRRVFLLLAQ AAVQQDGLAQ KASKAVRRAA 

201 LKYEHLPEAA VADWFSVQG REKEKAIGAL QRLAKLDTEI LPPTLMTLRL 

251 TARKYPEILD GFFEQTDTQN LSAVWQEMEI MNLVSLHRLD DAYARLNVLL 

301 ERNPNADLYI QAAILAANRK EGASVIDGYA EKAYGRGTEE QRSRAALTAA 

351 MMYADRRDYA KVRQWLKKVS APEYLFDKGV LAAAAAVELD GGRAALRQIG 

4 01 RVRKLPEQQG RYFTADNLSK IQMLALSKLP DKREALRGLD KIIEKPPAGS 

4 51 NTELQAEALV QRSWYDRLG KRKKMISDLE RAFRLAPDNA QIMNNLGYSL 

501 LTDSKRLDEG FALLQTAYQI NPDDTAVNDS IGWAYYLKGD AESALPYLRY 

551 SFENDPEPEV AAHLGEVLWA LGERDQAVDV WTQAAHLTGD KKIWRETLKR 

601 HGIALPQPSR KPRK* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF9 shows 89.8% identity over a 166aa overlap with an ORF (ORF9a) from strain A of N. 
meningitidis: 



10 20 30 40 50 

orf 9 . pep RFKMLTVLTATLIAGQVSAAGGGAGDMKQPKEVGKVFRKQQRYSEEEIKNERARLA 
I I : I : I I : I : I : I I I : II I I : I I I I ! I M I I I I I I I II I I I I I I M I I I I 
orf 9a MLPARFTILSVLAAALLAGQAYAA--GAADAKPPKEVGKVFRKQQRYSEEEIKNERARLA 
10 20 30 40 50 

60 70 80 90 100 110 

orf 9 . pep AVGERVNQIFTLLGGETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFEQA 

M I I I I I I I I I I I I I I I I I I I I I I I I || I I I I I II I I I I I I I II I I I II I I I I I 

orf 9a AVGERVNQIFTLLGXETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFEQA 

60 70 80 90 100 110 

120 130 140 150 160 

orf 9 .pep EMIYQKWRQIEPIPGKAQKRAGWLRNVLRERGNQHLDGREEVLAQADEGQ 
I M I I I I M I I I I I M I M I M M I M I I I I I 1 I N I I M MINI I 
orf 9a EMIYQKWRQIEPIPGKAQKRAGWLRNVLRERGNQHLDGLEEXLAQADEXQNRRVFLLLAQ 
120 130 140 150 160 170 

orf 9a AAVQQDGLAQ KASKAVRRAALRYEHLPEAAVADWFSVQXREKEKAIGALQRLAKLDTE I 

180 190 200 210 220 230 

The complete length ORF9a nucleotide sequence <SEQ ID 43> is: 

1 ATGTTACCCG CCCGTTTCAC CATTTTATCT GTGCTCGCGG CAGCCCTGCT 
51 TGCCGGGCAG GCGTATGCCG CCGGCGCGGC GGATGCGAAG CCGCCGAAGG 
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101 

151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 



AAGTCGGAAA 
AAAAACGAAC 
ATTTACGTTG 
CGGCTCTGGC 
GTCGCCGAAC 
GGCGGAAATG 
AGGCGCAAAA 
AATCAGCATC 
ACAGAACCGC 
ACGGGTTGGC 
TATGAACATC 
GGNACGCGAA 
TCGATACGGA 
CGCAAAT AT C 
AAACCTTTCG 
TGCACAGGCT 
AATCCGAATG 
AAAAGAANGT 
GGGGGACGGG 
TATGCCGACC 
GTCCGCGCCG 
CTGTCGAGTT 
CGGAAACTTC 
CAAAATACAG 
TGAGGGGGTT 
GAG T TACAGG 
TGGCAAGCGG 
CACCCGATAA 
GATTCCAAAC 
AATCAACCCG 
ACCTGAAANG 
GAAAACGACC 
GGCATTGGGC 
ACCTTACGGG 
ATCGCATTGC 



GGTTTTCAGA 
GCGCACGGCT 
CTGGGANGGG 
AACCTATATG 
GCGCCTTGGA 
ATTTATCAGA 
ACGGGCGGGG 
TAGACGGACT 
AGGGTGTTTT 
GCAAAAAGCA 
TGCCCGAAGC 
AAGGAAAAGG 
AATATTGCCC 
CCGAAATACT 
GCCGTCTGGC 
GGATGATGCC 
CAGACCTGTA 
GCTTCCGTTA 
GGAACAGCGG 
GAAGGGATTA 
GAATACCTGT 
GGACNGCGGC 
CCGAACAGCA 
ATGTTCGCCC 
GGACAAGATT 
CAGAGGCATT 
AAAAAAATGA 
CGCTCAGATT 
GTTTGGACGA 
GACGATACCG 
CGACGCGGAA 
CCGAGCCCGA 
GAACGCGATC 
AGACAAGAAA 
CCCAACCTTC 



AAGCAGCAGC 
TGCGGCAGTG 
AAACCGCCTT 
CTGATGTTGG 
AATGGCCGTG 
AATGGCGGCA 
TGGCTGCGGA 
GGAAGAANTG 
TATTGTTGGC 
TCGAAAGCGG 
GGCGGTTGCC 
CAATCGGAGC 
CCCACTTTAA 
CGACGGCTTT 
AGGAAATGGA 
TATGCGCGTT 
TATTCAGGCA 
TCGACGGCTA 
GGCAGGGCGG 
CACCAAAGTC 
TCGACAAAGG 
AGGGCGGCTT 
GGGGCGGTAT 
TGTCGAAGCT 
ATCGAAAAAC 
GGTACAGCGG 
TTTCAGATCT 
ATGAATAATC 
AGGCTTCGCC 
CTGTCAACGA 
AGCGCGCTGC 
AGTTGCCGCC 
AGGCGGTTGA 
ATATGGCGGG 
CCGAAAACCT 



GTTACAGCGA 
GGCGAGCGGG 
GCAAAAGGGG 
AACGCACAAA 
TCNCTGAACG 
GATTGAGCCT 
ACGTGCTGAG 
CTGGCTCAGG 
ACAAGCCGCC 
TTCGCCGCGC 
GATGT GGTGT 
TTTGCAGCGT 
TGACGTTGCG 
TTCGAGCAGA 
AATTATGAAT 
TGAACGTGCT 
GCGATATTGG 
CGCCGAAAAG 
CAATGACGGC 
AGGCAGTGGT 
TGTGCTGGCG 
TGCGGCAGAT 
TTTACGGCAG 
GCCCGACAAA 
CGCCTGCCGG 
TCAGTTGTTT 
TGAAAGGGCG 
TGGGCTACAG 
CTGCTTCAGA 
CAGCATAGGC 
CGTATCTGCG 
CATTTGGGCG 
CGTATGGACG 
AAACGCTCAA 
CGGAAATAA 



GGAAGAAATC 
TTAATCAGAT 
CAGGCGGGAA 
ATCCCCCGAA 
CGTT TGAACA 
ATACCGGGTA 
GGAAAGAGGA 
CGGACGAANG 
GTGCAACAGG 
GGCGTTGAGA 
TCAGCGTACA 
TTGGCGAAGC 
TCTGACTGCA 
CAGACACCCA 
CTGGTTTCCC 
GTTGGAACGC 
CGGCAAACCG 
GCATACGGCA 
GGCGATGATA 
TGAAAAAAGT 
GCTGCGGCGG 
CGGCAGGGTG 
ACAATTTGTC 
CGGGAGGCTT 
CAGTAATACA 
ACGATCGGCT 
TTCAGGCTTG 
CCTGCTTTCC 
CGGCATACCA 
TGGGCGTATT 
GTATTCGTTT 
AAGTGTTGTG 
CAGGCGGCAC 
ACGTCACGGC 



This encodes a protein having amino acid sequence <SEQ ID 44>: 



1 MLPARFTILS VLAAALLAGQ AYAAGA ADAK 



101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



KNERARLAAV 
VAERALEMAV 
NQHLDGLEEX 
YEHLPEAAVA 
RKYPEILDGF 
NPNADLYIQA 
YADRRDYTKV 
RKLPEQQGRY 
ELQAEALVQR 
DSKRLDEGFA 
ENDPEPEVAA 
IALPQPSRKP 



GERVNQIFTL 
SLNAFEQAEM 
LAQADEXQNR 
DVVFSVQXRE 
FEQTDTQNLS 
AILAANRKEX 
RQWLKKVSAP 
FTADNLSKIQ 
SVVYDRLGKR 
LLQTAYQINP 
HLGEVLWALG 
RK* 



LGXETALQKG 
IYQKWRQIEP 
RVFLLLAQAA 
KEKAIGALQR 
AVWQEMEIMN 
AS VI DGYAEK 
EYLFDKGVLA 
MFALSKLPDK 
KKMISDLERA 
DDTAVNDSIG 
ERDQAVDVWT 



PPKEVGKVFR 
QAGTALATYM 
IPGKAQKRAG 
VQQDGLAQKA 
LAKLDTEILP 
LVSLHRLDDA 
AYGRGTGEQR 
AAAAVELDXG 
REALRGLDKI 
FRLAPDNAQI 
WAYYLKXDAE 
QAAHLTGDKK 



KQQRYSEEEI 
LMLERTKSPE 
WLRNVLRERG 
SKAVRRAALR 
PTLMT LRLTA 
YARLNVLLER 
GRAAMTAAMI 
RAALRQIGRV 
IEKPPAGSNT 
MNNLGYSLLS 
SALPYLRYSF 
IWRETLKRHG 



ORF9a and ORF9-1 show 95.3% identity in 614 aa overlap: 



MLPARFT I L S VLAAALLAGQAYAAG — AADAKPPKEVGKVFRKQQRYSEEEIKNERARLA 
Ml M : f : 1 1 : I : I : I I I : III I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
MLPNRFKMLTVLTATLIAGQVSAAGGGAGDMKQPKEVGKVFRKQQRYSEEEIKNERARLA 
10 20 30 40 50 60 

60 70 80 90 100 110 

AVGERVNQIFTLLGXETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFEQA 

I I I ! I I I I I I I I I I I I I I I I I I I I I I I 

AVGERVNQIFTLLGGETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFEQA 
70 80 90 100 110 120 

120 130 140 150 160 170 
EMIYQKWRQIEPIPGKAQKRAGWLRNVLRERGNQHLDGLEEXLAQADEXQNRRVFLLLAQ 
I I I I I I I I I I I I I I I I | 
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orf9-l EMIYQKWRQIEPIPGKAQKRAGWLRNVLRERGNQHLDGLEEVLAQADEGQNRRVFLLLAQ 
130 140 150 160 170 180 

180 190 200 210 220 230 

5 orf9a pep AAVQQDGLAQKASKAVRRAALRYEHLPEAAVADWFSVQXREKEKAIGALQRIAKLDTEI 

5 orf9a.pep , , , , , , , , , , , , , , , , t | | : I I! II INI II I I I I I I I I I I I I I I I I I I I I 

orf9-l AAVQQDGLAQKASKAVRRAALKYEHLPEAAVADWFSVQGREKEKAIGALQRLAKLDTEI 
190 200 210 220 230 240 

in 240 250 260 270 280 290 

orf9a pep LPPTLMTLRLTARKYPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLHRLDDAYARLNVLL 

| | | | | | M I M I I I I i I II I I M I I II II I I I I I I I I I I I I I I I M 

orf 9-1 lpptlmTLRLTARKYPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLHRLDDAYARLNVLL 
250 260 270 280 290 300 

15 300 310 320 330 340 350 

orf9a pep ERNPNADLYIQAAILAANRKEXASVIDGYAEKAYGRGTGEQRGRAAMTAAMIYADRRDYT 
|| | | | | M | I I I I I I M I I I I I I I I I I I I I I I I I I M I I I : I I I : I I I I : I I I I I I I : 
orf 9-1 ERNPNADLYIQAAILAANRKEGASVIDGYAEKAYGRGTEEQRSRAALTAAMMYADRRDYA 

20 310 320 330 340 350 360 

360 370 380 390 400 410 

orf9a pep KVRQWLKKVSAPEYLFDKGVLAAAAAVELDXGRAALRQIGRVRKLPEQQGRYFTADNLSK 
| | | | || I ! I ! I I I I I i I I I I I II I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I M 
25 orf9-l KVRQWLKKVSAPEYLFDKGVLAAAAAVELDGGRAALRQIGRVRKLPEQQGRYFTADNLSK 

370 380 390 400 410 420 

420 430 440 450 460 470 

or f 9 a pep IQMFAL SKLPDKREALRGLDKI IEKPPAG SNTELQAEALVQRS WYDRLGKRKKMI SDLE 
30 I I I : I I I I I I I M I I I I I M I I I I I I I I I I I I I I I I I I I I I M I I I I M I I I I I I I M I I 

orf 9-1 IQMLALSKLPDKREALRGLDKI IEKPPAG SNTELQAEALVQRSWYDRLGKRKKMI SDLE 

430 440 450 460 470 480 

480 490 500 510 520 530 

35 orf 9a . pep RAFRLAPDNAQIMNNLGYSLLSDSKRLDEGFALLQTAYQINPDDTAVNDSIGWAYYLKXD 

orf 9-1 RAFRLAPDNAQIMNNLGYSLLTDSKRLDEGFALLQTAYQINPDDTAVNDSIGWAYYLKGD 
490 500 510 520 530 540 

40 540 550 560 570 580 590 

orf 9a . pep AESALPYLRYSFENDPEPEVAAHLGEVLWALGERDQAVDVWTQAAHLTGDKKIWRETLKR 

I I I i I II I I I I I I I I I 1 I I I I M II I I I I I II II I II I I I I I I I I I I I I I I I I M 

orf 9-1 AESALPYLRYSFENDPEPEVAAHLGEVLWALGERDQAVDVWTQAAHLTGDKKIWRETLKR 
550 560 570 580 590 600 

45 

600 610 
orf 9a. pep HGIALPQPSRKPRKX 



50 



orf9-l HGIALPQPSRKPRKX 
610 

Homology with a predicted ORF from N. gonorrhoeae 

ORF9 shows 82.8% identity over a 163aa overlap with a predicted ORF (ORF9.ng) from TV. 
gonorrhoeae: 

0r f9 RFKMLTVLTATLIAGQVSAAGGGAGDMKQPKEVGKVFRKQQRYSEEEIKNERAR 54 

II : I : 1 I : I : I : I II : II I I : 1 : : I I I I I I I : I I : : I I I I I II I I M I I 
orf9ng MIMLPARFTILSVLAAALLAGQAYAA — GAADVELPKEVGKVLRKHRRYSEEEIKNERAR 58 

or f 9 LAAVGERVNQI FTLLGGETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFE 114 

I I I I I I I I I : : I II II I I I I i I I I I I I I I I I I I I I ! I II I I I I I I I I I I I I I II M M I I 
or f 9ng LAAVGERVNRVFT LLGGETALQKGQAGTALATYMLMLERTKS PEVAERALEMAVS LNAFE 118 

orf 9 QAEMIYQKWRQIEPIPGKAQKRAGWLRNVLRERGNQHLDGREEVLAQADEGQ 166 

I II II I I I I I I I I I I 11 : II I 11111111:1 II 111 III I I : I 
orf 9ng QAEMIYQKWRQIEPIPGEAQKPAGWLRNVLKEGGNPHLDRLEEVPAQSDYVHQPMIFLLL 17 8 
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The ORF9ng nucleotide sequence <SEQ ED 45> was predicted to encode a protein having including 
acid sequence <SEQ ID 46>: 

1 MIMLPARFTI LSVLAAALLA GQAYAAGAAD VELPKEVGKV LRKHRRYSEE 

51 E IKNERARLA AVGERVNRVF TLLGGETALQ KGQAGTALAT YMLMLERTKS 

101 PEVAERALEM AVSLNAFEQA EMIYQKWRQI EPIPGEAQKP AGWLRNVLKE 

151 GGNPHLDRLE EVPAQSDYVH QP MIFLLLVQ AAVQHGGVA Q KPSKAVRPAA 

2 01 YNYEVLPETA GADAVFCVQG PQYEKAIQSF PPCGRNPQTE NIAPPFNELF 

251 RPTARPISPK LLQRFFRTEP NLAKPFRPPG PEMETYQTGF PRPLTRNNPT 

Amino acids 1-28 are a putative leader sequence, and 173-189 are predicted to be a transmembrane 



Further sequence analysis revealed the complete length ORF9ng DNA sequence <SEQ ID 47>: 

1 ATGTTACCCG CCCGTTTCAC TATTTTATCT GTCCTCGCAG CAGCCCTGCT 

51 TGCCGGACAG GCGTATGCTG CCGGCGCGGC GGATGTGGAG CTGCCGAAGG 

101 AAGTCGGAAA GGTTTTAAGG AAACATCGGC GTTACAGCGA GGAAGAAATC 

151 AAAAACGAAC GCGCACGGCT TGCGGCAGTG GGCGAACGGG TCAACAGGGT 

201 GTTTACGCTG TTGGGCGGTG AAACGGCTTT GCAGAAAGGG CAGGCGGGAA 

251 CGGCTCTGGC AACCTATATG CTGATGTTGG AACGCACAAA ATCCCCCGAA 

301 GTCGCCGAAC GCGCCTTGGA AATGGCCGTG TCGCTGAACG CGTTTGAACA 

351 GGCGGAAATG ATTTATCAGA AATGgcggca gatcgagcct ataCcgggtg 

401 aggcgcaaaa accgGcgggG tggctgcgga acgtattgaa ggaagggGGa 

4 51 aaTCAGCATC TGGAcgggtt gaaagaggTG CtggcgcaAT cggacgatGT 

501 GCAAAAAcgc aggaTATTTT TGCTGCTGGT GCAAGCCGCC GTGCagcagg 

551 gTGGGGTGGC TCAAAAAGCA TCGAAAGCGG TTCGCcgtgc GGcgttgaAG 

601 TATGAACATC TGCCcgaagc ggcggTTGCC GATGcggTGT TCGGCGTACA 

651 GGGACGCGAA AAGGAAAagg caaTCGAAGC TTTGCAGCGT TTGGCGAAGC 

701 TCGATACGGA AATATTGCCC CCCACTTTAA TGACGTTGCG TCTGACTGCA 

751 CGCAAATATC CCGAAATACT CGACGGCTTT TTCGAGCAGA CAGACACCCA 

801 AAACCTTTCG GCCGTCTGGC AGGAAATGGA AATTATGAAT CTGGTTTCCC 

851 TGCGTAAGCC GGATGATGCC TATGCGCGTT TGAACGTGCT GTTGGAACAC 

901 AACCCGAATG CAAACCTGTA TATT CAGGCG GCGATATTGG CGGCAAACCG 

951 AAAAGAAGGT GCGTCCGTTA TCGACGGCTA CGCCGAAAAG GCATACGGCA 

1001 GGGGGACGGG GGAACAGCGG GGCagggcgg cAATgacggc GGCGATGATA 

1051 TATGCCGACC GCAGGGATTA CGCCAAAGTC AGGCAGTGGT TGAAAAAAGT 

1101 GTCCGCGCCG GAATACCTGT TCGACAAAGG CGTGCTGGCG GCTGCGGCGG 

1151 CTGCCGAATT GGACGGAGGC CGGGCGGCTT TGCGGCAGAT CGGCAGGGTG 

1201 CGGAAACTTC CCGAACAGCA GGGGCGGTAT TTTACGGCAG ACAATTTGTC 

1251 CAAAATACAG ATGCTCGCCC TGTCGAAGCT GCCCGACAAA CGGGAAGCCC 

13 01 TGATCGGGCT GAACAACATC ATCGCCAAAC TTTCGGCGGC GGGAAGCACG 
1351 GAACCTTTGG CGGAAGCATT GGCACAGCGT TCCATTATTT ACGaacAGTT 

14 01 cggCAAACGG GGAAAAATGA TTGCCGACCT tgaAACcgcg CTCAAACTTA 
14 51 CGCCCGATAA TGCACAAATT ATGAATAATC TGGGCTACAG CCTGCTTTCC 
1501 GATTCCAAAC GTTTGGACGA GGGTTTCGCC CTGCTTCAGA CGGCATACCA 
1551 AATCAACCCG GACGATACCG CCGTTAACGA CAGCATAGGC TGGGCGTATT 
1601 ACCTGAAAGG CGACgcggaA AGCGCGCTGC CGTATCTGcg gtattcgttt 
1651 gAAAACGACC CCGAGCCCGA AGTTGCCGCC CATTTGGGCG AAGTGTTGTG 
17 01 GGCATTGGGC GAACGCGATC AGGCGGTTGA CGTATGGACG CAGGCGGCAC 

17 51 ACCTTAGGGG AGACAAGAAA ATATGGCGGG AGACGCTCAA ACGCTACGGA 

18 01 ATCGCCTTGC CCGAGCCTTC CCGAAAACCC CGGAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 48>: 

1 MLPARFTILS VLAAALLAGQ AYAAGAA DVE LPKEVGKVLR KHRRYSEEEI 

51 KNERARLAAV GERVNRVFTL LGGETALQKG QAGTALATYM LMLERTKSPE 

101 VAERALEMAV SLNAFEQAEM IYQKWRQIEP I PGE AQKPAG WLRNVLKEGG 

151 NQHLDGLKEV LAQSDDVQKR RIFLLLVQAA VQQGGVAQKA SKAVRRAALK 

2 01 YEHL PEAAVA DAVFGVQGRE KEKAIEALQR LAKLDTEILP PTLMTLRLTA 
251 RKYPEILDGF FEQTDTQNLS AVWQEMEIMN LVSLRKPDDA YARLNVLLEH 

3 01 NPNANLYIQA AILAANRKEG ASVIDGYAEK AYGRGTGEQR GRAAMTAAMI 
351 YADRRDYAKV RQWLKKVSAP EYLFDKGVLA AAAAAELDGG RAALRQIGRV 

4 01 RKLPEQQGRY FTADNLSKIQ MLALSKLPDK REALIGLNNI IAKLSAAGST 
4 51 EPLAEALAQR SIIYEQFGKR GKMIADLETA LKLTPDNAQI MNNLGYSLLS 
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501 DSKRLDEGFA LLQTAYQINP DDTAVNDSIG WAYYLKGDAE SALPYLRYSF 
551 ENDPEPEVAA HLGEVLWALG ERDQAVDVWT QAAHLRGDKK IWRETLKRYG 
601 IALPEPSRKP RK* 

ORF9ng and ORF9-1 show 88.1% identity in 614 aa overlap: 

10 20 30 40 50 60 

or f 9- 1 pep MLPNRFKMLTVLTATLIAGQVSAAGGGAGDMKQPKEVGKVFRKQQRYSEEE IKNERARLA 
Ml || : | : M : I : I : I i I : III 1 : I : : I I I I I I I : I I :: I I I I I I I I I I I I I I I 
or f 9 nq- 1 MLPARFT ILSVLAAALLAGQAYAAG- -AADVELPKEVGKVLRKHRRYSEEE IKNERARLA 

10 20 30 40 50 

70 80 90 100 110 120 

orf 9-1 pep AVGERVNQIFTLLGGETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFEQA 

| | | | | | | :: I I I I I 1 I M I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I 
orf 9ng-l AVGERVNRVFTLLGGETALQKGQAGTALATYMLMLERTKSPEVAERALEMAVSLNAFEQA 

60 70 80 90 100 110 

130 140 150 160 170 180 

orf 9-1 pep EMIYQKWRQIEPIPGKAQKRAGWLRNVLRERGNQHLDGLEEVLAQADEGQNRRVFLLLAQ 

I I I I I I I 111:111 11111111:1 11111111:11111:1: I : I I : I I I I : I 

orf 9ng-l EMIYQKWRQIEPIPGEAQKPAGWLRNVLKEGGNQHLDGLKEVLAQSDDVQKRRIFLLLVQ 
120 130 140 150 160 170 

190 200 210 220 230 240 

orf9-l pep AAVQQDGLAQKASKAVRRAALKYEHLPEAAVADVVFSVQGREKEKAIGALQRLAKLDTEI 
I I I I I I : I II I I I I II I I I I I I I I M I I I I I I : I I : I I II I II II I I I II I I I I I I I I 
orf 9ng-l AAVQQGGVAQKASKAVRRAALKYEHLPEAAVADAVFGVQGREKEKAIEALQRLAKLDTEI 
180 190 200 210 220 230 

250 260 270 280 290 300 

orf 9-1 pep LPPTLMTLRLTARKYPEILDGFFEQTDTQNLSAWQEMEIMNLVSLHRLDDAYARLNVLL 
I | | I I I M I I II I I I I I i I I I I I I I I I M I I II I I I i I I I I I I I I I : : I I I I I I I I I I 1 
orf 9ng-l LPPTLMTLRLTARKYPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLRKPDDAYARLNVLL 

240 250 260 270 280 290 

310 320 330 340 350 360 

orf 9-1 . pep ERNPNADLYIQAAILAANRKEGASVIDGYAEKAYGRGTEEQRSRAALTAAMMYADRRDYA 
I : I I I 1 : 1 I I I I I 1 I I I I I I I I I I I I I I I I I I 1 II I I I I I I : I I I : I I I I : I I I I I I I I 
orf 9ng-l EHNPNANLYIQAAILAANRKEGASVIDGYAEKAYGRGTGEQRGRAAMTAAMIYADRRDYA 
300 310 320 330 340 350 

370 380 390 400 410 420 

orf 9-1 . pep KVRQWLKKVSAPEYLFDKGVLAAAAAVELDGGRAALRQIGRVRKLPEQQGRYFTADNLSK 
I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I 
orf 9ng-l KVRQWLKKVSAPEYLFDKGVLAAAAAAELDGGRAALRQIGRVRKLPEQQGRYFTADNLSK 
360 370 380 390 400 410 

430 440 450 460 470 480 

or f 9-1 . pep IQMLALSKLPDKREALRGLDKIIEKPPAGSNTELQAEALVQRSVVYDRLGKRKKMISDLE 



490 500 510 520 530 540 

RAFRLAPDNAQIMNNLGYSLLTDSKRLDEGFALLQTAYQINPDDTAVNDSIGWAYYLKGD 
I :: I : I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
TALKLTPDNAQIMNNLGYSLLSDSKRLDEGFALLQTAYQINPDDTAVNDSIGWAYYLKGD 
480 490 500 510 520 530 

550 560 570 580 590 600 

AESALPYLRYSFENDPEPEVAAHLGEVLWALGERDQAVDWTQAAHLTGDKKIWRETLKR 
I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
AESALPYLRYSFENDPEPEVAAHLGEVLWALGERDQAVDVWTQAAHLRGDKKIWRETLKR 
540 550 560 570 580 590 

610 

HGIALPQPSRKPRKX 
: I I I I I : I I I I I I I I 

YGIALPEPSRKPRKX 
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600 610 

In addition, ORF9ng shows significant homology with a hypothetical protein from P. aeruginosa: 

sp I P42810 I YHE3_PSEAE HYPOTHETICAL 64.8 KD PROTEIN IN HEMM-HEMA INTERGENIC REGION 
(ORF3) 

>gi|1072999|pir| IS49376 hypothetical protein 3 - Pseudomonas aeruginosa >gi|557259 
(X82071) orf3 [Pseudomonas aeruginosa] Length = 576 
Score = 128 bits (318), Expect = le-28 

Identities = 138/587 (23%), Positives = 228/587 (38%), Gaps = 125/587 (21%) 

Query: 67 VFTLLGGETALQKGQAGTALATYMLMLERTKSPEVAEPALEMAVSLNAFEQAEMIYQKWR 126 

+++LL E A Q+ + AL+ Y++ ++T+ P V+ERA +A L A ++A W 
Sbjct: 53 L YS LLVAE LAGQRNRFDI AL SN YWQAQKTRDPGV SERAFRI AE YLGADQE ALDT S LLWA 112 

Query: 127 QIEPIPGEAQKPAG WLRNVLKEGGNQHLDGLKEVLAQSDDVQKRRI 172 

+ P +AQ+ A ++ VL G+ HDL A++D + + 

Sbjct: 113 RSAPDNLDAQRAAAIQLARAGRYEESMVYMEKVLNGQGDTHFDFLALSAAETDPDTRAGL 172 

Query: 173 FXXXXXXXXXXXXXXXR7ASKAVRRAALKYEHLPEAAVADAVFGVQGREKEKAIEALQRLA 2 32 

++ KY + + A+ Q ++A+ L+ + 

Sbjct: 173 L QSFDHLLKKYPNNGQLLFGKALLLQQDGRPDEALTLLEDNS 214 

Query: 233 KLDTEILPPTLMTLRLTARK YPEILDGFFEQTDTQNLSAVWQEMEIMNLVSLRKP 287 

E+PL+L + K P+GED + + + + LV + 
Sbjct: 215 ASRHEVAPLLLRSRLLQSMKRSDEALPLLKAGIKEHPDDKRVRLAYARL LVEQNRL 270 

Query: 288 DDAYARLNVLLEHNPN ANLYIQAAI 312 

DDA A L++ P+ A +Y++ + 

Sbjct: 271 DDAKAEFAGLVQQFPDDDDDLRFSLALVCLEAQAWDEARIYLEELVERDSHVDAAHFNLG 330 

Query: 313 -LAANRKEGASVIDGYAEKAYGRGTGEQRGRAAMTAAMIYADRRDYAKVRQWLKKVSAPE 371 

LA +K+ A +D YA+ GG + T++ARDAR + P+ 

Sbjct: 331 RLAEEQKDTARALDEYAQ — VGPGNDFLPAQLRQTDVLLKAGRVDEAAQRLDKARSEQPD 38 8 

Query: 372 YLFDKXXXXXXXXXXXXXXXXXXRQIGRVRKLPEQQGRYFTADNLSKIQMLALSKLPDKR 431 

Y A L 1+ ALS + 

Sbjct: 389 Y AIQLYLIEAEALSNNDQQE 408 

Query: 4 32 EALIGLNNIIAKLSAAGSTEPLAEALAQRSIIYEQFGKRGKMIADLETALKLTPDNAQIM 4 91 

+A + + + ELL RS++ E+ +M DL + PDNA + 

Sbjct: 4 09 KAWQAIQEGLKQYP EDL-NLLYTRSMLAEKRNDLAQMEKDLRFVIAREPDNAMAL 4 62 

Query: 4 92 NNLGYSLLSDSKRLDEGFALLQTAYQINPDDTAVNDSIGWAYYLKGDAESALPYLRYSFE 551 

N LGY+L + R E L+ A+++NPDD A+ DS+GW Y +G A YLR + + 
Sbjct: 4 63 NALGYTLADRTTRYGEARELILKAHKLNPDDPAILDSMGWINYRQGKLADAERYLRQALQ 522 

Query: 552 NDPEPEVAAHLGEVLWALGERDQAVDVWTQAAHLRGDKKIWRETLKR 598 

P+ EVAAHLGEVLWA G+A+W+ +D+R T+KR 
Sbjct: 523 RYPDHEVAAHLGEVLWAQGRQGDARAIWREYLDKQPDSDVLRRTIKR 569 

gi 12983399 (AE000710) hypothetical protein [Aquifex aeolicus] Length = 545 
Score =81.5 bits (198), Expect = le-14 

Identities = 61/198 (30%), Positives = 98/198 (48%), Gaps = 19/198 (9%) 

Query: 408 GRYFTADNL— SKIQMLALSKLPDKREALIGLNNIIAKLSAAGSTEPLAEALAQ 459 

G Y A L K ++LA PDK+E L + +K + + L + 

Sbjct: 335 GNYEDAKRLIEKAKVLA P DKKE I L FLE AD Y Y S KT KQ Y DKALE ILKKLEKDYPNDSR 390 

Query: 4 60 RSIIYEQFGKRGKMIADLETALKLTPDNAQIMNNLGYSLLS — DSKRLDEGFALLQ 513 

+ I+Y+ G L A++L P+N N LGYSLL +R++E L++ 

Sbjct: 391 VYFMEAIVYDNLGDIKNAERALRKAIELDPENPDYYNYLGYSLLLWYGKERVEEAEELIK 450 

Query: 514 TAYQ INPDDTAVNDS I GWAYYLKGDAESALPYLRYS F-ENDPE PEVAAHLGEVLWALGER 572 

A + +P++ A DS+GW YYLKGD E A+ YL + E +P V H+G+VL +G + 
Sbjct: 451 KALEKDPENPAYIDSMGWVYYLKGDYERAMQYLLKALREAYDDPWNEHVGDVLLKMGYK 510 

Query: 573 DQAVDVWTQAAHLRGDKK 590 

++A + + +A L + K 
Sbjct: 511 EEARNYYERALKLLEEGK 528 
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Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 7 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 49>: 

1 AACCTCTACG CCGGCCCGCA GACCACATCC GTCATCGCAA ACATCGCCGA 

51 CAACCT GCAA CTGGCCAAAG ACTACGGCAA AGTACACTGG TTCGCCTCCC 

101 CGCTCTTCTG GCTCCTGAAC CAACTGCACA ACATCATCGG CAACTGGGGC 

151 TGGGCGATTA TCGTTTTAAC CATCATCGTC AAAGCCGTAC TGTATCCATT 

201 GACCAACGCC TCTTACCGCT CTATGGCGAA AATGCGTGCC GCCGCACCCA 

251 AACTGCAAGC CAT CAAAGAG AAATACGGCG ACGACCGTAT GGCGCAACAA 

301 CAGGCGATGA TGCAGCTTTA CACAGACGAG AAAATCAACC CGgCTGGGCG 

351 GCTGCCTGCC TATGCTGTTG CAAATCCCCG TCTTCATCGG ATTGTATTGG 

401 GCATTGTTCG CCTCCGTAGA ATTGCGCCAG GCACCTTGGC TGGGTTGGAT 

451 TACCGACCTC AGCCGCGCCG ACCCCTACTA CATCCTGCCC ATCATTATGG 

501 CGGCAACGAT GTTCGCCCAA ACTTATCTGA ACCCGCCGCC GAcCGACCCG 

551 ATGCagGCGA AAATGATGAA AATCATGCCG TTGGTTTTCT CsGwCrTGTT 

601 CTTCTTCTTC CCTGCCGGks TGGTATTGTA CTGGGTAGTC AACAACCTCC 

651 TGACCATCGC CCAGCAATGG CACATCAACC GCAGCATCGA AAAACAACGC 

7 01 GCCCAAGGCG AAGTCGTTTC CTAA 

This corresponds to the amino acid sequence <SEQ ID 50; ORF1 1>: 

1 . .NLYAGPQTTS VIANIADNLQ LAKDYGKVHW FASPLFWLLN QLHNIIGNWG 
51 W AIIVLTIIV KAVLYPLTN A SYRSMAKMRA AAPKLQAIKE KYGDDRMAQQ 

101 QAMMQLYTDE KINPLGGCLP MLLQIPVFIG LYWALFA 5VE LRQAPWLGWI 
151 TDLSRADPYY ILPIIMAATM FAQTYLNPPP TDPMQAKMMK IMP LVFSXXF 

201 FFFPAGXVLY WWNNLLT I A QQWHINRSIE KQRAQGEWS * 

Further sequence analysis revealed the complete DNA sequence <SEQ ID 5 1>: 

1 ATGGATTTTA AAAGACTCAC GGCGTTTTTC GCCATCGCGC TGGTGATTAT 

51 GATCGGCTGG GAAAAGATGT TCCCCACTCC GAAGCCAGTC CCCGCGCCCC 

101 AACAGGCAGC ACAACAACAG GCCGTAACCG CTTCCGCCGA AGCCGCGCTC 

151 GCGCCCGCAA CGCCGATTAC CGTAACGACC GACACGGTTC AAGCCGTCAT 

201 TGATGAAAAA AGCGGCGACC TGCGCCGGCT GACCCTGCTC AAATACAAAG 

251 CAACCGGCGA CGAAAATAAA CCGTTCATCC TGTTTGGCGA CGGCAAAGAA 

301 TACACCTACG TCGCCCAATC CGAACTTTTG GACGCGCAGG GCAACAACAT 

351 TCTAAAAGGC ATCGGCTTTA GCGCACCGAA AAAACAGTAC AGCTTGGAAG 

4 01 GCGACAAAGT TGAAGTCCGC CTGAGCGCGC CTGAAACACG CGGTCTGAAA 

451 ATCGACAAAG TTTATACTTT CACCAAAGGC AGCTATCTGG TCAACGTCCG 

501 CTTCGACATC GCCAACGGCA GCGGTCAAAC CGCCAACCTG AGCGCGGACT 

551 ACCGCATCGT CCGCGACCAC AGCGAACCCG AGGGTCAAGG TTACTTTACC 

601 CACTCTTACG TCGGCCCTGT TGTTTATACC CCTGAAGGCA ACTTCCAAAA 

651 AGTCAGCTTT TCCGACTTGG ACGACGATGC CAAATCCGGC AAATCCGAGG 

701 CCGAATACAT CCGCAAAACC CCGACCGGCT GGCTCGGCAT GATTGAACAC 

751 CACTTCATGT CCACCTGGAT TCTCCAACCT AAAGGCAGAC AAAGCGTTTG 

801 CGCCGCAGGC GAGTGCAACA TCGACATCAA ACGCCGCAAC GACAAGCTGT 

851 ACAGCACCAG CGTCAGCGTG CCTTTAGCCG CCATCCAAAA CGGCGCGAAA 

901 GCCGAAGCCT CCATCAACCT CTACGCCGGC CCGCAGACCA CATCCGTCAT 

951 CGCAAACATC GCCGACAACC TGCAACTGGC CAAAGACTAC GGCAAAGTAC 

1001 ACTGGTTCGC CTCCCCGCTC TTCTGGCTCC TGAACCAACT GCACAACAT C 

1051 ATCGGCAACT GGGGCTGGGC GATTATCGTT TTAACCATCA TCGTCAAAGC 

1101 CGTACTGTAT CCATTGACCA ACGCCTCTTA CCGCTCTATG GCGAAAATGC 

1151 GTGCCGCCGC ACCCAAACTG CAAGCCATCA AAGAGAAATA CGGCGACGAC 

12 01 CGTATGGCGC AACAACAGGC GATGATGCAG CTTTACACAG AC GAGAAAAT 

12 51 CAACCCGCTG GGCGGCTGCC TGCCTATGCT GTTGCAAATC CCCGTCTTCA 

13 01 TCGGATTGTA TTGGGCATTG TTCGCCTCCG TAGAATTGCG CCAGGCACCT 
1351 TGGCTGGGTT GGATTACCGA CCTCAGCCGC GCCGACCCCT ACTACATCCT 

14 01 GCCCATCATT ATGGCGGCAA CGATGTTCGC C CAAACT TAT CTGAACCCGC 
1451 CGCCGACCGA CCCGATGCAG GCGAAAATGA T G AAAAT CAT GCCGTTGGTT 
1501 TTCTCCGTCA TGTTCTTCTT CTTCCCTGCC GGTCTGGTAT TGTACTGGGT 
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1551 AGTCAACAAC CTCCTGACCA TCGCCCAGCA ATGGCACATC AACCGCAGCA 
1601 TCGAAAAACA ACGCGCCCAA GGCGAAGTCG TTTCCTAA 

This corresponds to the amino acid sequence <SEQ ID 52; ORF1 1-1>: 



1 MDFKRLTAFF AIALVIMIGW EKMFPTPKPV PAPQQAAQQQ AVTASAEAAL 

51 APATPITVTT DTVQAVIDEK SGDLRRLTLL KYKATGDENK PFILFGDGKE 

101 YTYVAQSELL DAQGNNILKG IGFSAPKKQY SLEGDKVEVR LSAPETRGLK 

151 IDKVYTFTKG SYLVNVRFDI ANGSGQTANL SADYRIVRDH SEPEGQGYFT 

201 HSYVGPWYT PEGNFQKVSF SDLDDDAKSG KSEAEYIRKT PTGWLGMIEH 

251 HFMSTWILQP KGRQSVCAAG ECNIDIKRRN DKLYSTSVSV PLAAIQNGAK 

301 AEASINLYAG PQTTSVIANI ADNLQLAKDY GKVHWFAS PL FWLLNQLHNI 

351 IGNWGW AIIV LTIIVKAVLY PLT NASYRSM AKMRAAAPKL QAIKEKYGDD 

401 RMAQQQAMMQ LYTDEKINPL GGCLP MLLQI PVFIGLYWAL FA SVELRQAP 

4 51 WLGWITDLSR ADPYYILPII MAATMFAQTY LNPPPTDPMQ AKMMKIMPLV 

501 FSVMFFFFPA GLVLYW WNN LLTIAQQWHI NRS IEKQRAQ GEWS* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a 60kDa inner-membrane protein (accession P25754) of Pseudomonas putida 
ORF1 1 and the 60kDa protein show 58% aa identity in 229 aa overlap (BLASTp). 

ORFll 2 LYAGPQTT SVIANI ADNLQLAKDYGKVHWFAS PLFWLLNQLHNT I GNWGWAI IVLT I IVK 61 

LYAGP+ S + ++ L+L DYG + + A P+FWLL +H+++GNWGW+IIVLT+++K 
60K 324 LYAGPKIQSKLKELSPGLELTVDYGFLWFIAQPIFWLLQHIHSLLGNWGWSIIVLTMLIK 38 3 

ORFll 62 AVLYPLTNASYRSMAKMRAAAPKLQAIKEKYGDDRXXXXXXXXXLYTDEKINPLGGCLPM 121 

+ +PL+ ASYRSMA+MRA APKL A+KE++GDDR LY EKINPLGGCLP+ 

6 OK 384 GLFFPLSAASYRSMARMRAVAPKLAALKERFGDDRQKMSQAMMELYKKEKINPLGGCLPI 4 43 

ORFll 122 LLQI PVFIGLYWAL FAS VELRQAPWLGWITDLSRADPYYILP I IMAATMFAQT YLNPPPT 181 

L+Q+PVF+ LYW L SVE+RQAPW+ WITDLS DP++ILPIIM ATMF Q LNP P 
60K 444 LVQMPVFLALYWVLLESVEMRQAPWILWITDLSIKDPFFILPIIMGATMFIQQRLNPTPP 503 

ORFll 182 DPMQAKMMKIMPLVXXXXXXXXPAGXVLYWVVNNLLTIAQQWHINRSIE 230 

DPMQAK+MK+MP++ PAG VLYWWNN L+I+QQW+I R IE 

60K 504 DPMQAKVMKMMPIIFTFFFLWFPAGLVLYWWNNCLSISQQWYITRRIE 552 

Homology with a predicted QRF from N. meningitidis (strain A) 

ORF1 1 shows 97.9% identity over a 240aa overlap with an ORF (ORF1 la) from strain A of N. 
meningitidis: 



10 20 30 

NLYAGPQTTSVIAN IADNLQLAKDYGKVHW 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
IKRRNDKLYSTSVSVPLAAIQNGAKSXASINLYAGPQTTSVIANIADNLQLXKDYGKVHW 
280 290 300 310 320 330 

40 50 60 70 80 90 

FASPLFWLLNQLHNIIGNWGWAIIVLTIIVKAVLYPLTNASYRSNAKMRAAAPKLQAIKE 
I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I | | I | | | | 
FASPLFWLLNQLHNIIGNWGWAIIVLTIIVKAVLYPLTNASYRSMAKMRAAAPKLQAIKE 
340 350 360 370 380 390 

100 110 120 130 140 150 

KYGDDRMAQQQAMMQLYTDEKINPLGGCLPMLLQIPVFIGLYWALFASVELRQAPWLGWI 
I I I I I I I I I I I I I 1 11 I I I I I I I I | | | | J | | | | ] | | | | | | | j J | | | | | | | | | | | | | | j | | 
KYGDDRMAQQQAMMQLYTDEKINPLGGCLPMLLQIPVFIGLYWALFASVELRQAPWLGWI 
400 410 420 430 440 450 

160 170 180 190 200 210 

T DLSRADPYYI LPI IMAATMFAQT YLN PP PT D PMQAKMMKIMPLVFSXXFFFFPAGXVLY 
I I I I I I I I I I I I I I M I I I I I I 1 I I I I I ! M | | | | | | 1 | 1 | | | | | | | | | | | | | | ,|, 
T DLSRADPYYI LPI IMAATMFAQT YLN PPPTDPMQAKMMKIMPLVXSXXFFXFPAGLVLY 
460 470 480 490 500 510 
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220 230 240 

orfll pep WVWNLLTIAQQWHINRSIEKQRAQGEWSX 
I I : I I I I I I I I I I I I I 1 I I I I I I I I I M M I 
o r f 1 1 a WVINNLLT IAQQWHINRS IEKQRAQGEWSX 

520 530 540 

The complete length ORF1 la nucleotide sequence <SEQ ID 53> is: 

1 ANGGATTTTA AAAGACT CAC NGNGTTTTTC GCCATCGCAC TGGTGATTAT 

51 GATCGGATNG NAAANGATGT TCCCCACTCC GAAGCCCGTC CCCGCGCCCC 

101 AACAGACGGC ACAACAACAG GCCGTAANCG CTTCCGCCGA AGCCGCGCTC 

151 GCGCCCGNAN CGCCGATTAC CGTAACGACC GACACGGTTC AAGCCGTCAT 

201 TGATGAAAAA AGCGGCGACC TGCGCCGGCT GACCCTGCTC AAATACAAAG 

251 CAACCGGCGA CNAAAATAAA CCGTTCATCC TGTTTGGCGA CGGCAAANAA 

301 TACACCTACN TCGCCCANTC CGAACTTTTG GACGCGCAGG G C AACAAC AT 

351 TCTAAAAGGC ATCGGCTTTA GCGCACCGAA AAAACAGTAC AGCTTGGAAG 

4 01 GCGACAAAGT TGAAGTCCGC CTGAGCGCAC CTGAAACACG CGGTCTGAAA 

451 ATCGACAAAG TTTATACTTT CACCAAAGGC AGCTATCTGG TCAACGTCCG 

501 CTTCGACATC GCCAACGGCA GCGGTCAAAC CGCCAACCTG AGCGCGGACT 

551 ACCGCATCGT CCGCGACCAC AGCGAACCCG AGGGTCAAGG CTACTTTACC 

601 CACTCTTACG TCGGCCCTGT TGTTTATACC CCTGAAGGCA ACTTCCAAAA 

651 AGTCAGCTTC TCCGACTTGG ACGACGATGC CAANTCCGGN AAATCCGAGG 

701 CCGAATACAT CCGCAAAACC CNGACCGGCT GGCTCGGCAT GAT TGAACAC 

751 CACTTCATGT CCACCTGGAT CCTCCAACCC AAAGGCGGAC AAAGCGTTTG 

801 CGCCGCTGGC GACTGCNGTA TN G AC AT CAA ACGCCGCAAC GACAAGCTGT 

851 ACAGC AC CAG CGTCAGCGTG CCTTTAGCCG CTATCCAAAA CGGTGCGAAA 

901 TCCNAAGCCT CCATCAACCT CTACGCCGGC CCACAGACCA CATCNGTTAT 

951 CGCAAACATC GCCGACAACC TGCAACTGGN CAAAGACTAC GGCAAAGTAC 

1001 ACTGGTTCGC CTCCCCCCTC TTTTGGCTTT TGAACCAACT GCACAACATC 

1051 ATCGGCAACT GGGGCTGGGC GATTATCGTT TTAACCATCA TCGTCAAAGC 

1101 CGTACTGTAT C CAT T G AC C A ACGCCTCTTA CCGTTCGATG GCGAAAATGC 

1151 GTGCCGCCGC GCCCAAACTG CAAGCCATCA AAGAGAAATA CGGCGACGAC 

12 01 CGTATGGCGC AGCAACAAGC CAT GAT GC AG CTTTACACAG ACGAGAAAAT 

1251 CAACCCGCTG GGCGGCTGCC TGCCTATGCT GTTGCAAATC CCCGTCTTCA 

1301 TCGGATTGTA TTGGGCATTG TTCGCCTCCG TAGAATTGCG CCAGGCACCT 

1351 TGGCTGGGTT GGATTACCGA CCTCAGCCGC GCCGACCCNT ACTACATCCT 

1401 GCCCATCATT ATGGCGGCAA CGATGTTCGC CCAAACCTAT CTGAACCCGC 

1451 CGCCGACCGA CCCGATGCAG GCGAAAAT GA TGAAAATCAT GCCTTTGGTT 

1501 NTNTCNNNNA NGTTCTTCKIN CTTCCCTGCC GGTCTGGTAT TGTACTGGGT 

1551 GATCAACAAC CTCCTGACCA TCGCCCAGCA ATGGCACATC AACCGCAGCA 

1601 TCGAAAAACA ACGCGCCCAA GGCGAAGTCG TTTCCTAA 

This encodes a protein having amino acid sequence <SEQ ID 54>: 



1 XDFKRLTXFF AIALVIMIGX XXMFPTPKPV PAPQQTAQQQ AVXASAEAAL 

51 APXXPITVTT DTVQAVIDEK SGDLRRLTLL KYKATGDXNK PFILFGDGKX 

101 YTYXAXSELL DAQGNNILKG IGFSAPKKQY SLEGDKVEVR LSAPETRGLK 

151 I DKVYT FTKG SYLVNVRFDI ANGSGQTANL SADYRIVRDH SEPEGQGYFT 

2 01 HSYVGPVVYT PEGNFQKVSF SDLDDDAXSG KSEAEYIRKT XTGWLGMIEH 

2 51 HFMSTWILQP KGGQSVCAAG DCXXD IKRRN DKLYSTSVSV PLAAIQNGAK 

301 SXASINLYAG PQTTSVIANI ADNLQLXKDY GKVHWFASPL FWLLNQLHNI 

351 IGNWGW AIIV LTIIVKAVLY PLT NASYRSM AKMRAAAPKL QAIKEKYGDD 

4 01 RMAQQQAMMQ LYTDEKINPL GGCLP MLLQI PVF1GLYWAL FA SVELRQAP 

4 51 WLGWITDLSR ADPYYILPII MAATMFAQTY LNPPPTDPMQ AKMMKIMPLV 

501 XSXXFFXFPA GLVLY WVINN LLTIAQQWHI NRS IEKQRAQ GEVVS* 

ORFlla and ORF11-1 show 95.2% identity in 544 aa overlap: 



10 20 30 40 50 60 

or f 11a . pep XDFKRLTXFFAIALVIMIGXXXMFPTPKPVPAPQQTAQQQAVXASAEAALAPXXPITVTT 
IMil I I I I I I I I I I I I I I I I I 1 I I I I I I : I I I I I I : I I I I I I I I ! : I I I I I I 
orfll-1 MDFKRLTAFFAIALVIMIGWEKMFPTPKPVPAPQQAAQQQAVTASAEAALAPATPITVTT 

10 20 30 40 50 60 



70 80 90 100 110 120 

DTVQAVIDEKSGDLRRLTLLKYKATGDXNKPFILFGDGKXYTYXAXSELLDAQGNNILKG 
1 I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I 111 I I I I I I I I I I II 11 I 
DTVQAVIDEKSGDLRRLTLLKYKATGDENKPFILFGDGKEYTYVAQSELLDAQGNNILKG 
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70 



90 100 110 120 



130 140 150 160 170 180 

orflla oep IGFSAPKKQYSLEGDKVEVRLSAPETRGLKIDBCVYTFTKGSYLVNVRFDIANGSGQTANL 
ortna.pep | | | | [ | | | | | | | | | | | | | | 1 | | | | I I I M I I I I I I I I I I M I I I 1 I ! I I I I I I M I I I I I 

1 1 -1 IGFSAPKKQYSLEGDKVEVRLSAPETRGLKIDKVYTFTKGSYLVNVRFDIANGSGQTANL 

° 130 140 150 160 170 180 

190 200 210 220 230 240 

or f 11a pep SADYRIVRDHSEPEGQGYFTHSYVGPVVYTPEGNFQKVSFSDLDDDAXSGKSEAEYIRKT 
| | | | | | | | | | | | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M M M I I I I I I I M I I I 
nT . f1 -, _-, SADYRIVRDHSEPEGQGYFTHSYVGPWYTPEGNFQKVSFSDLDDDAKSGKSEAEYIRKT 

190 200 210 220 230 240 

250 260 270 280 290 300 

orflla pep XTGWLGMIEHHFMSTWILQPKGGQSVCAAGDCXXDIKRRNDKLYSTSVSVPLAAIQNGAK 

' P I I I I I I I I I I I I I I I I I I I Mll:l I I I I M M I I I I I ! I i M I I I I I I I I 

orfll-1 PTGWLGMIEHHFMSTWILQPKGRQSVCAAGECNIDIKRRNDKLYSTSVSVPLAAIQNGAK 

250 260 270 280 290 300 

310 320 330 340 350 360 

orflla pep SXASINLYAGPQTTSVIANIADNLQLXKDYGKVHWFASPLFWLLNQLHNIIGNWGWAIIV 
: | ! | | | | | i M I I I I II II I I I I I I I I M I I I I I I I I I I I I I I I I U I I I I I I I I I I I 
orfll-1 AEASINLYAGPQTTSVIANIADNLQLAKDYGKVHWFASPLFWLLNQLHNIIGNWGWAIIV 

310 320 330 340 350 360 

370 380 390 400 410 420 

orflla pep LTIIVKAVLYPLTNASYRSMAKMRAAAPKLQAIKEKYGDDRMAQQQAMMQLYTDEKINPL 

I I 1 I I I I I I I I I I I I I I I I I M I II I I I I I I I 1 I I I 1 

orfll-1 LTI I VKAVLYPLTNAS YRSMAKMRAAAPKLQAIKEKYGDDRMAQQQAMMQLYTDEKIN PL 

370 380 390 400 410 420 

430 440 450 460 470 480 

orflla pep GGCLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTY 
I | I I I I I I I I M I ! I I I ! I I I I I I I I I I I I M I I I I I I I I I I I I I I I II M II II I I I I I 
orfll-1 GGCLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTY 
430 440 450 460 470 480 

490 500 510 520 530 540 

orflla pep LNPPPTDPMQAKMMKIMPLVXSXXFFXFPAGLVLYWVINNLLTIAQQWHINRSIEKQRAQ 
I I i I I I I I I I I I I I I I I II I I II I I M I I I I I I : I I I I I II I II I I I I I I I I I I I I 
orfll-1 LNPPPTDPMQAKMMKIMPLVFSVMFFFFPAGLVLYWWNNLLTIAQQWHINRSIEKQRAQ 

490 500 510 520 530 540 

orflla. pep GEVVSX 
I I I I I I 

orfll-1 GEVVSX 

Homology with a predicted ORF from N .gonorrhoeae 

ORF1 1 shows 96.3% identity over a 240aa overlap with a predicted ORF (ORFll.ng) from N. 
gonorrhoeae: 

Orfll NLYAGPQTTSVIANIADNLQLAKDYGKVHWFASPLFWLLNQLHNIIGNWGWAIIVLT 57 

I I I ! I I I I I I I I M I I I I II I I I I II I I I I I I I I I I I I II I I I I I I I I M I I I : I I I 
orfllng MAVNLYAGPQTTSVIANIADNLQLAKDYGKVHWFASPLFWLLNQLHNIIGNWGWAIWLT 60 

orfll IIVKAVLYPLTNASYRSMAKMRAAAPKLQAIKEKYGDDRMAQQQAMMQLYTDEKINPLGG 117 

I I I I I I I I I I I I I I I I I I I I I I I I I I : I I : I I I I I I I I I I I I I I I I I I I : I I : I I I II I 
orfllng IIVKAVLYPLTNASYRSMAKMRAAAPELQTIKEKYGDDRMAQQQAMMQLFEDEEINPLGG 120 

orfll CLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTYLN 17 7 

I I I I II I II I ! I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I II I I I I I I I I I I I I 
orfllng CLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTYLN 180 
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orfl 1 ppptdpmQAKMMKIMPLVFSXXFFFFPAGXVLYWWNNLLTIAQQWHINRSIEKQRAQGE 237 

° rf11 Milllllllllllllllll III IN I II II II M Ml IN II 

orfllng PPPTDPMQAKMMKIMPLVFSVMFFFFPAGLVLYWWNNLLT IAQQWHINRSIEKQRAQGE 24 0 

orfll VVS 240 

I I 1 

orfllng WS 243 

An OKF1 lng nucleotide sequence <SEQ ID 55> was predicted to encode a protein having amino 
acid sequence <SEQ ID 56>: 

1 MAVNLYAGPQ TTSVIANIAD NLQLAKDYGK VHWFASPLFW LLNQLHNIIG 

51 NWGW AIWLT IIVKAVLYPL TN ASYRSMAK MRAAAPELQT IKEKYGDDRM 

101 AQQQAMMQLF EDEEINPLGG CLP MLLQIPV F I GLYWAL FA SVELRQAPWL 

151 GWITDLSRAD PYYILPIIMA ATMFAQTYLN PPPTDPMQAK MMKIMP LVFS 

201 VMFFFFPAGL VLYW WNNLL TIAQQWHINR SIEKQRAQGE WS* 

Further sequence analysis revealed the complete gonococcal DNA sequence <SEQ ID 57> to be: 

1 ATGGATTTTA AAAGACTCAC GGCGTTTTTC GCCATCGCGC TGGTGATTAT 

51 GATCGGCTGG GAAAAAATGT TCCCCACCCC GAAACCCGTC CCCGCGCCCC 

101 AACAGGCGGC ACAAAAACAG GCAGCAACCG CTTCCGCCGA AGCCGCGCTC 

151 GCGCCCGCAA CGCCGAT TAC CGTAACGACC GACACGGTTC AAGCCGTTAT 

201 TGATGAAAAA AGTGGCGACC TGCGCCGGCT GACCCTGCTC AAATACAAAG 

251 CAACCGGCGA CGAAAACAAA CCGTTCGTCC TGTTTGGCGA CGGCAAAGAA 

301 TACACCTACG TCGCCCAATC CGAACTTTTG GACGCGCAGG GCAACAACAT 

351 TCTGAAAGGC ATCGGCTTTA GCGCACCGAA AAAACAGTAC ACCCTCAACG 

401 GCGACACAGT CGAAGTCCGC CTGAGCGCGC CCGAAACCAA CGGACTGAAA 

451 ATCGACAAAG TCTATACCTT TACCAAAGAC AGCTATCTGG TCAACGTCCG 

501 CTTCGACATC GCCAACGGCA GCGGTCAAAC CGCCAACCTG AGCGCGGACT 

551 ACCGCATCGT CCGCGACCAC AGCGAACCCG AGGGTCAAGG CTACTTTACC 

601 CACTCTTACG TCGGCCCTGT TGTTTATACC CCTGAAGGCA ACTTCCAAAA 

651 AGTCAGCTTC TCCgacTTgg acgACGATGC gaaaTccggc aaATccgagg 

701 ccgaatacaT CCGCAAAACC ccgaccggtt ggctcggcat gattgaacac 

751 cacttcatgt ccacctggat cctccAAcct aaaggcggcc aaaacgtttg 

801 cgcccaggga gactgccgta tcgacattaa aCgccgcaac gacaagctgt 

851 acagcgcaag cgtcagcgtg cctttaaccg ctatcccaac ccgggggcca 

901 aaaccgaaaa tggcggTCAA CCTGTATGCC GGTCCGCAAA CCACATCCGT 

951 TATCGCAAAC ATCGCcgacA ACCTGCAACT GGCAAAAGAC TACGGTAAAG 

1001 TACACTGGTT CGCATCGCCG CTCTTCTGGC TCCTGAACCA ACTGCACAAC 

1051 ATTATCGGCA ACTGGGGCTG GGCAAT CGTC GTTTTGACCA TCATCGTCAA 

1101 AGCCGTACTG TATCCATTGA CCAACGcctc ctACCGTTCG ATGGCGAAAA 

1151 TGCGTGccgc cgcacCcaaA CTGCAGACCA TCAAAGAAAA ATAcgGCGAC 

12 01 GACCGTATGG CGCAACAGCA AGCGAT GATG CAGCTTTACA AAgacgAGAA 

1251 AATCAACCCG CTGGGCGGCT GTctgcctat gctgttgCAA ATCCCCGTCT 

1301 TCATCGGCTT GTACTGGGCA TTGTTCGCCT CCGTAGAATT GCGCCAGGCA 

1351 CCTTGGCTGG GCTGGATTAC CGACCTCAGC CGCGCCGACC CCTACTACAT 

1401 CCTGCCCATC ATTATGGCGG CAACGATGTT CGCCCAAACC TATCTGAACC 

1451 CGCCGCCGAC CGACCCGATG CAGGCGAAAA TGATGAAAAT CATGCCGTTG 

1501 GTTTTCTCCG TCATGTTCTT CTTCTTCCCT GCCGGTTTGG TTCTCTACTG 

1551 GGTGGTCAAC AACCTCCTGA CCATCGCCCA GCAGT GGCAC AT CAACCGCA 

1601 GCATCGAAAA ACAACGCGCC CAAGGCGAAG TCGTTTCCTA A 

This encodes a protein having amino acid sequence <SEQ ID 58; ORF1 lng-l>: 

1 MDFKRLTAFF AIALVIMIGW EKMFPTPKPV PAPQQAAQKQ AATASAEAAL 

51 APATPITVTT DTVQAVIDEK SGDLRRLTLL KYKATGDENK PFVLFGDGKE 

101 YTYVAQSELL DAQGNNILKG IGFSAPKKQY TLNGDTVEVR LSAPETNGLK 

151 I DKVYT FTKD SYLVNVRFDI ANGSGQTANL SADYRIVRDH SEPEGQGYFT 

2 01 HSYVGPWYT PEGNFQKVSF SDLDDDAKSG KSEAEYIRKT PTGWLGMIEH 

2 51 HFMSTWILQP KGGQNVCAQG DCRIDIKRRN DKLYSASVSV PLTAIPTRGP 

301 KPKMAVNLYA GPQTTSVIAN IADNLQLAKD YGKVHWFASP LFWLLNQLHN 

351 IIGNWGW AIV VLTIIVKAVL YPLTN ASYRS MAKMRAAAPK LQTIKEKYGD 

4 01 DRMAQQQAMM QLYKDEKINP LGGCLP MLLQ IPVFIGLYWA LFA SVELRQA 

4 51 PWLGWITDLS RADPYYILPI IMAATMFAQT YLNPPPTDPM QAKMMKIMPL 

501 VFSVMFFFFP AGLVLYW WN NLLTIAQQWH INRSIEKQRA QGEWS* 



ORF1 lng-1 and ORF1 1-1 shown 95.1% identity in 546 aa overlap: 
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r f 1 lna- 1 pep MD FKRLTAFFAI ALVIMI GWEKMFPT PKPVPAPQQAAQKQAAT AS AE AALAP AT P I T VTT 
| 1 | | | | | | | M | | | I i I I I 1 I I I I I I I i I I I I 1 I I I M : 1 I : I I I II I I I I 1 I 1 I M I II 
rfll-1 MDFKRLTAFFAIALVIMIGWEKMFPTPKPVPAPQQAAQQQAVTASAEAALAPATPITVTT 



orf llng-1 .pep 



70 80 90 100 110 120 

DTVQAVIDEKSGDLRRLTLLKYKATGDENKPFVLFGDGKEYTYVAQSELLDAQGNNILKG 
| | | | | | | I I I I I I I I I I I I I I I I I II I I I I I I : I II I I I I I I 1 1 I 1 I I I I I I I I I I I II I 
1A orf 1 1-1 DTVQAVIDEKSGDLRRLTLLKYKATGDENKPFILFGDGKEYTYVAQSELLDAQGNNILKG 

1U otz±± 70 80 90 100 110 120 

130 140 150 160 170 180 

orfllng-1 pep I GFS APKKQYT LNGDTVEVRLS APETNGLKI DKVYT FTKD S YLVNVRFDIANGSGQT ANL 
15 M | | M I I I I : I : I I I I I M M I I I I I I i II I I I I I I I 1 I I 1 I I I I I I I I I I I I I I I 

orf 11-1 IGFSAPKKQYSLEGDKVEVRLSAPETRGLKIDKVYT FTKGS YLVNVRFDIANGSGQT ANL 

130 140 150 160 170 180 

190 200 210 220 230 240 

20 orfllng-1 pep SADYRIVRDHSEPEGQGYFTHSYVGPVVYTPEGNFQKVSFSDLDDDAKSGKSEAEYIRKT 

|| | | | | | I I I I I II I II I I I I I I I II I II I II I II I I I I I I I M M M I 1 I I I I I I M I I 
orf 11-1 SADYRIVRDHSEPEGQGYFTHSYVGPVVYTPEGNFQKVSFSDLDDDAKSGKSEAEYIRKT 
190 200 210 220 230 240 

25 250 260 270 280 290 300 

orf llng-1 .pep PT GWLGMIEHHFMSTWI LQPKGGQNVCAQGDCRI D I KRRNDKLYSASVSVPLTAI PTRGP 

M M I ! I I I II II I I I I I II I I I : I I I hi II I I I I I I I I I I : I : I I ■ I 

orf 11-1 PTGWLGMIEHHFMSTWILQPKGRQSVCAAGECNIDIKRRNDKLYSTSVSVPLAAIQN-GA 
250 260 270 280 290 

30 

310 320 330 340 350 360 

orfllng-1 pep KPKMAVNLYAGPQTTSVIANIADNLQLAKDYGKVHWFASPLFWLLNQLHNIIGNWGWAIV 

I : :: M I I I I I I M I I I I M I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I h 
orf 11-1 KAEASINLYAGPQTTSVIANIADNLQLAKDYGKVHWFASPLFWLLNQLHNIIGNWGWAII 
35 300 310 320 330 340 350 

370 380 390 400 410 420 

orfllng-1 . pep VLTIIVKAVLYPLTNASYRSMAKMRAAAPKLQTIKEKYGDDRMAQQQAMMQLYKDEKINP 

I I I I I I I I I I I I I I I I I M I I I I I I : II II II I I I I I I I I I II II I I 

40 orf 11-1 VLTIIVKAVLYPLTNASYRSMAKMRAAAPKLQAIKEKYGDDRMAQQQAMMQLYTDEKINP 

360 370 380 390 400 410 

430 440 450 460 470 480 

orf llng-1. pep LGGCLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQT 
45 I I I I M I I I I I I I I M I I I I I I I I I M I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 

orf 11-1 LGGCLPMLLQIPVFIGLYWALFASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQT 
420 430 440 450 460 470 

490 500 510 520 530 540 

50 orf llng-1 -pep YLNPPPTDPMQAKMMKIMPLVFSVMFFFFPAGLVLYWVVNNLLTIAQQWHINRSIEKQRA 

I I I I I I I II I I I I I I I I I I 1 I I I I II I I I I I I II I I I ! II I I I I I I I I I I I I I M 

orf 11-1 YLNPPPTDPMQAKMMKIMPLVFSVMFFFFPAGLVLYWWNNLLTIAQQWHINRSIEKQRA 
480 490 500 510 520 530 

55 

orf llng-1. pep QGEWSX 

orfll-1 QGEWSX 
540 

60 In addition, ORF llng-1 shows significant homology with an inner-membrane protein from the 
database (accession number p25754): 

ID 60IM_PSEPU STANDARD; PRT; 560 AA. 

AC P25754; 

DT 01-MAY-1992 (REL. 22, CREATED) 

65 DT 01-MAY-1992 (REL. 22, LAST SEQUENCE UPDATE) 

DT 01-NOV-1995 (REL. 32, LAST ANNOTATION UPDATE) 

DE 60 KD INNER-MEMBRANE PROTEIN. . . . 
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SCORES Initl: 
Smith- Waterman sec 



1074 Initn: 1293 Opt: 1103 
;e: 1406; 41.5% identity in 574 



10 



orfllng-l.pep MDFKR' 
p25754 



-LTAFFAIALVIMIGW- 



- PKPVPAPQQAAQKQ 



MDIKRTILIAALAWSYVMVLKWNDDYGQAALPTQNTAASTVAPGLPDGVPAGNNGASAD 



50 



60 



70 



80 



rf llng-1 . pep AATAS AE AALAPAT PIT-- 



- VTTDTVQAVI DEKSGDLRRLTLLKYKATGDE -NKP F 
| | | : : I I : I 1 : : I : I I I h I II 

VPSANAESSPAELAPVALSKDLIRVKTDVLELAIDPVGGDIVQLNLPKYPRRQDHPNIPF 
70 80 90 100 110 120 



100 110 120 

orf llng-1 . pep VLFGDGKEYTYVAQSELLDAQGNNILKGIG- 

| | : I I : I : I I I I : : I : : : I : : I : I : I I : I : : I : : : : I 

p25754 QLFDNGGERVYLAQSGLTGTDGPDA-RASGRPLYAAEQKSYQLADGQEQLWDLKFS 

130 140 150 160 170 

150 160 170 180 190 200 

orf llng-1 pep TNGLKIDKVYTFTKDSYLVNVRFDIANGSGQT7ANLSADYRIVRDHS-EPEGQGYF-THSY 

||:: I : : I : I : I I : I I I I I : I : : : I I I : I : : I : I 

p2 5 7 5 4 DNGVNY IKRFS FKRGEYDLNVSYLI DNQSGQAWNGNMFAQLKRDAS GDPS S S TATGTATY 

180 190 200 210 220 230 

210 220 230 240 250 260 

VGPVVYTPEGNFQKVSFSDLDDDAKSGKSEAEYIRKTPTGWLGMIEHHFMSTWILQPKGG 



orf llng-1 .pep 



50 



270 280 290 300 310 320 

orf llng-1 . pep QNVCAQGDCRIDIKRRNDKLYSASVSVPLTAIPTRGPKPKMAVNLYAGPQTTSVIANIAD 

: | | :::::: I : : I : : : I : I I : : : I I I I I : I : : : : 

p2 5 7 5 4 NNV VQTRKD SQGNYI I GYTGPVI SVPA-GGKVET SALLYAGPKIQSKLKELS P 

290 300 310 320 330 

330 340 350 360 370 380 

orfllng-l.pep NLQLAKDYGKVHWF— ASPLFWLLNQLHNI IGNWGWAIWLTIIVKAVLYPLTNASYRSMA 

: | : | : | | | : I I I : I : I I I I : : : I : : : I II I I : I : I I I : : : I : : : : I I : I I M I I I 
p257 54 GLELTVDYGFL-WFIAQPIFWLLQHIHSLLGNWGWSIIVLTMLIKGLFFPLSAASYRSMA 
340 350 360 370 380 390 

390 400 410 420 430 440 

orfllng-l.pep KMRAAAPKLQTIKEKYGDDRMAQQQAMMQLYKDEKINPLGGCLPMLLQIPVFIGLYWALF 
: | | I : II I I : : | | : : I 1 I I : :: M I I : I I I I II I I I I I I I I : I : I : I II : : I I I = I = 
p25754 RMRAVAPKLAALKERFGDDRQKMSQAMMELYKKEKINPLGGCLPILVQMPVFLALYWVLL 
400 410 420 430 440 450 

450 460 470 480 490 500 

orf llng-1 .pep ASVELRQAPWLGWITDLSRADPYYILPIIMAATMFAQTYLNPPPTDPMQAKMMKIMPLVF 
111:11111: I I I I II I I :: I I I I I I : I I I I I III I I I I I I I : I I : 11 : : I 

p257 54 ESVEMRQAPWILWITDLSIKDPFFILPIIMGATMFIQQRLNPTPPDPMQAKVMKMMPIIF 
460 470 480 490 500 510 

510 520 530 540 

orf llng-1 -pep SVMFFFFPAGLVLYWVVNNLLTIAQQWHINRSIEKQRAQGEWSX 

: : I :: I I I I I I I I I I I I I I : I : I I I : I : I II 
p2 57 54 TFFFLWFPAGLVLYWWNNCLSISQQWYITRRIEAATKKAAA 
520 530 540 550 560 

Based on this analysis, including the homology to an inner-membrane protein from P. putida and 



65 the predicted transmembrane domains (seen in both the meningococcal and gonoccal proteins), it 
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is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 8 

The following partial DNA sequence was identified in TV. meningitidis <SEQ ID 59>: 

5 1 . . GCCGTCTTAA TCATCGAATT ATTGACGGGA ACGGTTTATC TTTTGGTTGT 

51 NAGCGCGGCT TTGGCGGGTT CGGGCATTGC TTACGGGCTG ACCGGCAGTA 

101 CGCCTGCCGC CGTCTTGACC GNCGCTCTGC TTTCCGCGCT GGGTATTTNG 

151 TTCGTACACG CCAAAACCGC CGTTAGAAAA GTTGAAACGG AT T CAT AT C A 

201 GGATTTGGAT GCCGGACAAT ATGTCGAAAT CCTCCGNCAC ACAGGCGGCA 

10 251 ACCGTTACGA AGTT . TTTAT CGCGGTACG. ACTGGCAGGC TCAAAATACG 

301 GGGCAAGA7AG AGCTTGAACC AGGAACTCGC GCCCTCATTG TCCGCAAGGA 

351 AGGCAACCTT CTTATTATCA CACACCCTTA A 

This corresponds to the amino acid sequence <SEQ ED 60; ORF13>: 

1 ..AVLIIELLTG TVYLLVVSAA LAGSGIAYGL TGSTPAAVLT XALLSALGIX 
15 51 FVHAKTAVRK VETDSYQDLD AGQYVEILRH TGGNRYEVXY RGTXWQAQNT 

101 GQEELEPGTR ALIVRKEGNL LIITHP* 

Further sequence analysis elaborated the DNA sequence slightly <SEQ ED 61>: 

1 . . GCCGTCTTAA TCATCGAATT ATTGACGGGA ACGGTTTATC TTTTGGTTGT 

51 nAGCGCGGCT TTGGCGGGTT CGGGCATTGC TTACGGGCTG ACCGGCAGTA 

20 101 CGCCTGCCGC CGTCTTGACC GnCGCTCTGC TTTCCGCGCT GGGTATTTnG 

151 TTCGTACACG CCAAAACCGC CGTTAGAAAA GTTGAAACGG AT T CAT AT CA 

201 GGATTTGGAT GCCGGACAAT ATGTCGAAAT CCTCCGACAC ACAGGCGGCA 

251 ACCGTTACGA AGTTTTtTAT CGCGGTACGc ACTGGCAGGC TCAAAATACG 

301 GGGCAAGAAG AGCTTGAACC AGGAACTCGC GCCCTCATTG TCCGCAAGGA 

25 351 AGGCAACCTT CTTATTATCA CACACCCTTA A 

This corresponds to the amino acid sequence <SEQ ID 62; ORF13-l>: 

1 . . AVLIIELLTG TVYLLVVSAA LAGSGIAYGL TGSTPAAVLT XA LLSALGIX 
51 FVHAKTAVRK VETDSYQDLD AGQYVEILRH TGGNRYEVFY RGTHWQAQNT 

101 GQEELEPGTR ALIVRKEGNL LIITHP* 

30 Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORP from N. meningitidis (strain A) 

ORF13 shows 92.9% identity over a 126aa overlap with an ORF (ORF13a) from strain A of N. 
meningitidis: 

10 20 30 40 50 

35 orfl3.pep AVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTXA LLSALGIXF 

orfl3a MTVWFVAAVAVLIIELLTGTVYLLVVSAALAGSGIAYGLTGSTPAAVLTA ALLSALGIWF 
10 20 30 40 50 60 

40 60 70 80 90 100 110 

orf 13 . pep VHAKTAVRKVETDSYQDLDAGQYVEILRHTGGNRYEVXYRGTXWQAQNTGQEELEPGTRA 
Mill!! I 1 I I I I I I I I I I I I I : I I I II : I I I I I II I I I I I M I I I I I M II I I I I I 
or f 1 3 a VHAKTAVGKVETDSYQDLDAGQ YAE I LRHAGGNRYEVFYRGTHWQAQNT GQEELE PGTRA 

70 80 90 100 110 120 
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The complete length ORF 13a nucleotide sequence <SEQ ID 63 > is: 

1 ATGACTGTAT GGTTTGTTGC CGCTGTTGCC GTCTTAATCA TCGAATTATT 

51 GACGGGAACG GTTTATCTTT TGGTTGTCAG CGCGGCTTTG GCGGGTTCGG 

101 GCATTGCTTA CGGGCTGACC GGCAGCACGC CTGCCGCCGT CTTGACCGCC 

5 151 GCTCTGCTTT CCGCGCTGGG TATTTGGTTC GTACACGCCA AAACCGCCGT 

201 GGGAAAAGTT GAAACGGATT CATATCAGGA TTTGGATGCC GGGCAATATG 

251 CCGAAATCCT CCGGCACGCA GGCGGCAACC GTTACGAAGT TTTTTATCGC 

301 GGTACGCACT GGCAGGCTCA AAATACGGGG CAAGAAGAGC TTGAACCAGG 

351 AACGCGCGCC CTAATCGTCC GCAAGGAAGG CAACCTTCTT ATCATCGCAA 

10 4 01 AACCTTAA 

This encodes a protein having amino acid sequence <SEQ ED 64>: 

1 MTVWFVAAVA VLIIELLTGT VYLLVVSAAL AGSGIAYGLT GSTPAAVLTA 
51 ALLSALGIWF VHAKTAVGKV ETDSYQDLDA GQYAEILRHA GGNRYEVFYR 
101 GTHWQAQNTG QEELEPGTRA LIVRKEGNLL TIAKP* 

15 ORF13a and ORF13-1 show 94.4% identity in 126 aa overlap 



MTVWFVAAVAVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTAALLSALGIWF 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
AVLIIELLTGTVYLLVVSAALAGSGIAYGLTGSTPAAVLTXALLSALGIXF 



10 



20 



30 



40 



50 



70 



80 90 100 110 120 

VHAKTAVGKVETDSYQDLDAGQYAEILRHAGGNRYEVFYRGTHWQAQNTGQEELEPGTRA 
NIMH M M M M I I I I N I : I N M : I I I ! I I I I M I I I I I I I I I I I I I I I ! I 1 ! I 
VHAKTAVRKVETDSYQDLDAGQYVEILRHTGGNRYEVFYRGTHWQAQNTGQEELEPGTRA 
60 70 80 90 100 110 



Homology with a predicted ORF from N .gonorrhoeae 

ORF13 shows 89.7% identity over a 126aa overlap with a predicted ORF (ORF13.ng) from N. 
gonorrhoeae: 





orfl3 


AVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTXALLSALGIXF 
M M 1 1 1 II 1 1 1 1 I I I I I | | | | | 1 | | | | | [| | | | | | | | | | || | | | | | | | 
MTVWFVAAVAVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTAALLSALGIWF 


51 


40 


orf 13ng 


60 


orfl3 


VHAKTAVRKVETDSYQDLDAGQYVEILRHTGGNRYEVXYRGTXWQAQNTGQEELEPGTRA 

MINI IIII:|:|:IM|:| IMIIMII :|||||| 

VHAKTAVGKVETDSYQDLDTGKYAEILRYTGGNRYEVFYRGTHWQAQNTGQEVFEPGTRA 


111 




orfl3ng 


120 


45 


orfl3 
orf 13ng 


LIVRKEGNLLIITHP 12 6 
1 1 1 1 1 1 1 1 1 1 1 1 : : 1 
LIVRKEGNLLIIANP 135 





The complete length ORF13ng nucleotide sequence <SEQ ID 65> is: 



101 
151 
201 
251 
301 
351 
401 



ATGACTGTAT 
GACGGGAACG 
GCATTGCCTA 
GCACTGCTTT 
GGGAAAAGTT 
CCGAAATCCT 
GGTACGCACT 
AACGCGCGCC 
ACCCTTAA 



GGTTTGTTGC 
GTTTATCTTT 
CGGGCTGACT 
CCGCGCTGGG 
GAAACGGATT 
CCGATACACA 
GGCAGGCGCA 
CTCATCGTCC 



CGCTGTTGCC 
TGGTTGTCAG 
GGCAGCACGC 
CATTTGGTTC 
CATATCAGGA 
GGCGGCAACC 
AAATACGGGG 
GCAAAGAAGG 



GTCTTAATCA 
CGCGGCTTTG 
CTGCCGCCGT 
GTACATGCCA 
TTTGGATACC 
GTTACGAAGT 
CAGGAAGTGT 
TAACCTTCTT 



TCGAATTATT 
GCGGGTTCGG 
CTTGACCGCC 
AAACCGCCGT 
GGAAAATATG 
TTTTTATCGC 
TTGAACCGGG 
ATCATCGCAA 
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This encodes a protein having amino acid sequence <SEQ ID 66>: 

1 MTVWFVAAVA VLIIELLTGT VYLLWSAAL AGSGIAYGLT GSTPAAVLTA 
51 ALLSALGIWF VHAKTAVGKV ETDSYQDLDT GKYAEILRYT GGNRYEVFYR 
101 GTHWQAQNTG QEVFEPGTRA LIVRKEGNLL IIANP* 

ORF13ng shows 91.3% identity in 126 aa overlap with ORF13-1: 

10 20 30 40 50 

orf 13-1 . pep AVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTXALLSALGIXF 
I I I I I I I I I I I I I 1 I I I I I I I ] I I ! I ! I I I 1 I 1 I 1 I I I I I I I I 1 M 1 I I 
orfl3ng MTVWFVAAVAVLIIELLTGTVYLLWSAALAGSGIAYGLTGSTPAAVLTAALLSALGIWF 
10 20 30 40 50 60 

60 70 80 90 100 110 

orf 13-1 .pep VHAKTAVRKVETDSYQDLDAGQYVEILRHTGGNRYEVFYRGTHWQAQNTGQEELEPGTRA 

I I i I I M ! I I I I ! I ! I I I : I : I : I I ! I : I I I II I II I I I I I I I I I I I I I I I : I I II I I 

orfl3ng VHAKTAVGKVETDSYQDLDTGKYAEILRYTGGNRYEVFYRGTHWQAQNTGQEVFEPGTRA 

70 80 90 100 110 120 

120 

orf 13-1. pep L I VRKEGNLL I ITH PX 

II II i I M I I I I : : I I 
orfl3ng LIVRKEGNLLI IANPX 

130 

Based on this analysis, including the extensive leader sequence in this protein, it is predicted that 
ORF 13 and ORF13ng are likely to be outer membrane proteins. It is thus predicted that the proteins 
from ~N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines 
or diagnostics, or for raising antibodies. 

Example 9 

The following DNA sequence was identified in N. meningitidis <SEQ ID 67>: 

1 ATGTwTGATT TCGGTTTrGG CGArCTGGTT TTTGTCGGCA TTATCGCCCT 

51 GATwGtCCTC GGCCCCGAAC GCsTGCCCGA GGCCGCCCGC AyCGCCGGAC 

101 GGcTCATCGG CAGGCTGCAA CGCTTTGTCG GcAGCGTCAA ACAGGAATTT 

151 GACACTCAAA TCGAACTGGA AGAACTGAGG AAGGCAAAGC AGGAATTTGA 

201 AGCTGCCGcC GCTCAGGTTC GAGACAGCCT CAAAGAAACC GGTACGGATA 

251 TGGAAGGCAA TCTGCACGAC ATTTCCGACG GTCTGAAGCC TTGGGAAAAA 

301 CTGCCCGAAC AGCGGACACC TGCCGATTTC GGTGTCGATG AAAACGGCAA 

351 TCCGCT.TCC CGATGCGGCA AACACCCTAT CAGACGGCAT TTCCGACGTT 

4 01 ATGCCGTC. . 

This corresponds to the amino acid sequence <SEQ ID 68; ORF2>: 

1 MXDFGLGELV FVGIIALIVL GPERXPEAAR XAGRLIGRLQ RFVGSVKQEF 
51 DTQIELEELR KAKQEFEAAA AQVRDSLKET GTDMEGNLHD ISDGLKPWEK 
101 LPEQRTPADF GVDENGNPXS RCGKHPIRRH FRRYAV. . 

Further work revealed the complete nucleotide sequence <SEQ ID 69>: 

1 ATGTTTGATT TCGGTTTGGG CGAGCTGGTT TTTGTCGGCA TTATCGCCCT 

51 GATTGTCCTC GGCCCCGAAC GCCTGCCCGA GGCCGCCCGC ACCGCCGGAC 

101 GGCTCATCGG CAGGCTGCAA CGCTTTGTCG GCAGCGTCAA ACAGGAATTT 

151 GACACTCAAA TCGAACTGGA AGAACTGAGG AAGGCAAAGC AGGAATTTGA 

201 AGCTGCCGCC GCTCAGGTTC GAGACAGCCT CAAAGAAACC GGTACGGATA 

251 TGGAAGGCAA TCTGCACGAC ATTTCCGACG GTCTGAAGCC TTGGGAAAAA 

301 CTGCCCGAAC AGCGGACACC TGCCGATTTC GGTGTCGATG AAAACGGCAA 

351 TCCGCTTCCC GATGCGGCAA ACACCCTATC AGACGGCATT TCCGACGTTA 

4 01 TGCCGTCCGA ACGTTCCTAC GCTTCCGCCG AAACCCTTGG GGACAGCGGG 
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451 CAAACCGGCA GTACAGCCGA ACCCGCGGAA ACCGACCAAG ACCGCGCATG 

501 GCGGGAATAC CTGACTGCTT CTGCCGCCGC ACCCGTCGTA CAGACCGTCG 

551 AAGTCAGCTA TATCGATACT GCTGTTGAAA CGCCTGTTCC GCACACCACT 

601 TCCCTGCGCA AACAGGCAAT AAGCCGCAAA CGCGATTTTC GTCCGAAACA 

651 CCGCGCCAAA CCTAAATTGC GCGTCCGTAA ATCATAA 

This corresponds to the amino acid sequence <SEQ ID 70; ORF2-l>: 

1 MFD FGLGELV FVGIIALIVL GPERLPEAAR TAGRLIGRLQ RFVGSVKQEF 

51 DTQIELEELR KAKQEFEAAA AQVRDSLKET GTDMEGNLHD I S DGLKPWEK 

101 LPEQRTPADF GVDENGNPLP DAANTLSDGI SDVMPSERSY ASAETLGDSG 

151 QTGSTAE PAE TDQDRAWREY LTASAAAPW QTVEVSYIDT AVETPVPHTT 

201 SLRKQAISRK RDFRPKHRAK PKLRVRKS* 

Further work identified the corresponding gene in strain A of TV. meningitidis <SEQ ID 71 >: 

1 ATGTTTGATT TCGGTTTGGG CGAGCTGGTT TTTGTCGGCA TTATCGCCCT 

51 GATTGTCCTC GGCCCCGAAC GCCTGCCCGA GGCCGCCCGC ACCGCCGGAC 

101 GGCTCATCGG CAGGCTGCAA CGCTTTGTCG GCAGCGTCAA ACAGGAATTT 

151 GACACGCAAA TCGAACTGGA AGAACTAAGG AAGGCAAAGC AGGAATTTGA 

201 AGCTGCCGCT GCTCAGGTTC GAGACAGCCT CAAAGAAACC GGTACGGATA 

251 TGGAGGGTAA TCTGCACGAC ATTTCCGACG GTCTGAAGCC TTGGGAAAAA 

301 CTGCCCGAAC AGCGCACGCC TGCTGATTTC GGTGTCGATG AAAACGGCAA 

351 TCCCTTTCCC GATGCGGCAA ACACCCTATT AGACGGCATT TCCGACGTTA 

401 TGCCGTCCGA ACGTTCCTAC GCTTCCGCCG AAACCCTTGG GGACAGCGGG 

4 51 CAAACCGGCA GTACAGCCGA ACCCGCGGAA ACCGACCAAG ACCGTGCATG 

501 GCGGGAATAC CTGACTGCTT CTGCCGCCGC ACCCGTCGTA CAGACCGTCG 

551 AAGTCAGCTA TATCGATACC GCTGTTGAAA CCCCTGTTCC GCATACCACT 

601 TCGCTGCGTA AACAGGCAAT AAGCCGCAAA CGCGATTTGC GTCCTAAATC 

651 CCGCGCCAAA CCTAAATTGC GCGTCCGTAA ATCATAA 

This encodes a protein having amino acid sequence <SEQ ED 72; ORF2a>: 

1 MFD FGLGELV FVGIIALIVL GPERLPEAAR TAGRLIGRLQ RFVGSVKQEF 

51 DTQIELEELR KAKQEFEAAA AQVRDSLKET GTDMEGNLHD IS DGLKPWEK 

101 LPEQRTPADF GVDENGNPFP DAANTLLDGI SDVMPSERSY ASAETLGDSG 

151 QTGSTAEPAE TDQDRAWREY LTASAAAPW QTVEVSYIDT AVETPVPHTT 

201 SLRKQAISRK RDLRPKSRAK PKLRVRKS* 

The originally-identified partial strain B sequence (ORF2) shows 97.5% identity over a 118a; 
overlap with ORF2a: 

10 20 30 40 50 60 

orf 2 . pep MXD FGLGELVFVGIIALIVL GPERXPEAARXAGRLIGRLORFVGSVKOEFDTOIEIiEF.T.R 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I II I I I I I 1 I I | I | | I | | 
orf 2a MFD FGLGELVFVGIIALIVL GPERLPEAARTAGRLIGRLORFVGSVKOEFDTOTFT.FF.T.R 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf 2 . pep KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPXS 

I I I I M M II I I I I I I II I I I II I I II I I I I I II I I I I I II I II I I I I I | | | | | | | [ | 
orf 2a KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPFP 
70 80 90 100 110 120 

130 

orf 2. pep RCGKH P I RRH FRRYAV 

orf 2a DAANTLLDGI SDVMPSERSYASAETLGDSGQTGSTAEPAETDQDRAWREYLTASAAAPVV 

130 140 150 160 170 180 

The complete strain B sequence (ORF2-1) and ORF2a show 98.2% identity in 228 aa overlap: 

orf 2a .pep MFDFGLGELVFVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQEFDTQIELEELR 60 

I I I M I I II I I I I I I I I I I I I I I 11 I I I I 1 I 1 I I I I | I | | | | | | | | | Ml || || [ 

orf 2-1 MFDFGLGELVFVGIIALTVLGPERLPEAARTAGRLIGRLQRFVGSVKQEFDTQIELEELR 60 

orf2a . pep KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPFP 120 

N I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I : I 
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KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPLP 120 

DAANTLLDGISDVMPSERSYASAETLGDSGQTGSTAEPAETDQDRAWREYLTASAAAPVV 180 
1 I I I I I I I i I I I I I 1 I I I I I 1 I I I I I I I I I I I I I I I I I I M I f I I ! II I I I I i M II I I 
DAANTLSDGISDVMPSERSYASAETLGDSGQTGSTAEPAETDQDRAWREYLTASAAAPW 180 

QTVEVSYIDTAVETPVPHTTSLRKQAISRKRDLRPKSRAKPKLRVRKSX 229 
M II I I I I II II I M II I M I ! I M I II I II I : I II I I I I I I I I I I I I 
QTVEVS Y I DT AVET PVPHTT S LRKQAI SRKRDFR PKHRAKPKLRVRKSX 229 



Further work identified a partial DNA sequence <SEQ ID 73> in N. gonorrhoeae encoding the 
following amino acid sequence <SEQ ID 74; ORF2ng>: 

1 MFD FGLGELI FVGIIALIVL GPERLPEAAR TAGRLIGRLQ RFVGSVKQEL 
51 DTQIELEELR KVKQAFEAAA AQVRDSLKET DTDMQNSLHD ISDGLKPWEK 
101 LPEQRTPADF GVDEKGNSLS RYGKHRIRRH FRRYAV* 

Further work identified the complete gonococcal gene sequence <SEQ ID 75>: 



1 


ATGTTTGATT 


TCGGTTTGGG 


CGAGCTGATT 


TTTGTCGGCA 


TTATCGCCCT 


51 


GATTGTCCTT 


GGTCCAGAAC 


GCCTGCCCGA 


AGCCGCCCGC 


ACTGCCGGAC 


101 


GGCTTATCGG 


CAGGCTGCAA 


CGCTTTGTAG 


GAAGCGTCAA 


ACAAGAACTT 


151 


GACACTCAAA 


TCGAACTGGA 


AGAGCTGAGG 


AAGGT CAAGC 


AGGCATTCGA 


201 


AGCTGCCGCC 


GCTCAGGTTC 


GAGACAGCCT 


CAAAGAAACC 


GATACGGATA 


251 


TGCAGAACAG 


TCTGCACGAC 


ATTTCCGACG 


GTCTGAAGCC 


TTGGGAAAAA 


301 


CTGCCCGAAC 


AGCGCACGCc 


tgccgatttc 


gGTGTCGATg 


AAAacggcaa 


351 


tccccttccc 


gATACGGCAA 


ACACCGTATC 


AGACGGCATT 


TCCGACGTTA 


401 


TGCCGTCTGA 


ACGTTCCGAT 


ACTtccgcCG 


AAACCCTTGG 


GGACGACAGG 


451 


CAAACCGGCA 


GTACAGCCGA 


ACCTGCGGAA 


ACCGACAAAG 


ACCGCGCATG 


501 


GCGGGAATAC 


CTGactgctt 


ctgccgccgc 


acctgtcgta 


Cagagggccg 


551 


tcgaagtcag 


ctaTATCGAT 


ACTGCTGTTG 


AAacgcctgT 


tccgcaCacc 


601 


acttccctgc 


gcaAACAGGC 


AATAAACCGC 


AAACGCGATT 


TttgtccgaA 


651 


ACACCGCGCc 


aAACCGAAat 


tgcgcgtcCG 


TAAAT CAT AA 


This encodes a 


protein having the amino acid sequence <SEQ ID 76; ORF2ng- 


l 


MFDFGLGELI 


FVGIIALIVL 


GPERLPEAAR 


TAGRLIGRLQ 


RFVGSVKQEL 


51 


DTQIELEELR 


KVKQAFEAAA 


AQVRDSLKET 


DTDMQNSLHD 


ISDGLKPWEK 


101 


LPEQRTPADF GVDENGNPLP 


DTANTVSDGI 


SDVMPSERSD 


TSAETLGDDR 


151 


QTGSTAEPAE 


TDKDRAWREY 


LTASAAAPW 


QRAVEVSYID 


TAVETPVPHT 


201 


TSLRKQAINR 


KRDFCPKHRA 


KPKLRVRKS * 







The originally-identified partial strain B sequence (ORF2) shows 87.5% identity over a 136aa 
overlap with ORF2ng: 

orf2 .pep MXDFGLGELVFVGIIALIVLGPERXPEAARXAGRLIGRLQRFVGSVKQEFDTQIELEELR 60 

I I I I ( I I I : I I I I M I I I I I I I I M II I : M M M I N I I II I I M I : I I I I I I II I I 
orf2ng MFDFGLGELI FVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQELDTQIELEELR 60 

orf2 .pep KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPXS 120 

l:N I I I I I I I I I I I I I I I I I I ::: I I I I I I I I I I M I I I I I I I I I I I I I I I : I I 
orf2ng KVKQAFEAAAAQVRDSLKETDTDMQNSLHDISDGLKPWEKL PEQRTPADFGVDEKGNSLP 120 

orf2.pep RCGKHPIRRH FRRYAV 136 

I IN II II I I I II I 
orf2ng R YGKHR I RRH FRRYAV 13 6 

The complete strain B and gonococcal sequences (ORF2-1 & ORF2ng-l) show 91.7% identity in 
229 aa overlap: 

10 20 30 40 50 60 

orf 2-1 . pep MFDFGLGELVFVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQEFDTQIELEELR 
I i M I I I I I = I I I I I I I I I I I I I I I I I II I I I I i I I I I I I I I I II II I I :[ I I I I I I I I I 
orf2ng-l MFDFGLGELIFVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQELDTQIELEELR 
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10 20 30 40 50 60 

70 80 90 100 110 120 

orf 2-1 . pep KAKQEFEAAAAQVRDSLKETGTDMEGNLHDISDGLKPWEKLPEQRTPADFGVDENGNPLP 
5 I : I I I i I M I I I I I I II M I M : : : M II I M II II I I I M I I I I I I I I I I I I I I I I I 

orf2ng-l KVKQAFEAAAAQVRDSLKETDTDMQNSLHDISDGLKPWEKLPEQRTPADFGVDENGNPLP 

70 80 90 100 110 120 

130 140 150 160 170 180 

10 orf 2-1 .pep DAANTLSDGISDVMPSERSYASAETLGDSGQTGSTAEPAETDQDRAWREYLTASAAAPVV 

I : I I I : I I M II I 1 I M M : I I I I I II : II I I I I II II M : II I I II I I I II M I M I 
orf2ng-l DTANTVSDGISDVMPSERSDTSAETLGDDRQTGSTAEPAETDKDRAWREYLTASAAAPW 
130 140 150 160 170 180 

15 190 200 210 220 229 

orf 2-1 .pep Q-TVEVSYIDTAVETPVPHTTSLRKQAISRKRDFRPKHRAKPKLRVRKSX 
I : II I I I I I I II II I I I I I I I I I II I I : I I I I I II I I I I I I I I I I I M 
orf 2ng- 1 QRAVEVSYIDTAVETPVPHTTSLRKQAINRKRDFCPKHRAKPKLRVRKSX 
190 200 210 220 230 

20 Computer analysis of these amino acid sequences indicates a transmembrane region (underlined), 
and also revealed homology (59% identity) between the gonococcal sequence and the TatB protein 
of E. coli : 

gnl|PID|el292181 (AJ005830) TatB protein [Escherichia coli] Length = 171 
Score = 56.6 bits (134), Expect = le-07 
25 Identities = 30/88 (34%), Positives = 52/88 (59%), Gaps = 1/88 (1%) 

Query: 1 MFDFGLGELIFVGIIALIVLGPERLPEAARTAGRLIGRLQRFVGSVKQELDTQIELEELR 60 
MFD G EL+ V II L+VLGP+RLP A +T I L+ +V+ EL +++L+E + 
^ Sbjct: 1 MFDIGFSELLLVFIIGLWLGPQRLPVAVKTVAGWIRALRSLATTVQNELTQELKLQEFQ 60 

Query: 61 -KVKQAFEAAAAQVRDSLKETDTDMQNS 87 

+K+ +A+ + LK + +++ + 
Sbjct: 61 D S LKKVE KAS L TN L T PE LKASMDE LRQA 88 

Based on this analysis, it was predicted that ORF2, ORF2a and ORF2ng are likely to be membrane 
35 proteins and so the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF2-1 (16kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described above. 
The products of protein expression and purification were analyzed by SDS-PAGE. Figure 3A 
shows the results of affinity purification of the GST-fusion protein, and Figure 3B shows the results 
40 of expression of the His-fosion in E. coli. Purified GST-fusion protein was used to immunise mice, 
whose sera were used for Western blots (Figure 3C), ELISA (positive result), and FACS analysis 
(Figure 3D). These experiments confirm that ORF37-1 is a surface-exposed protein, and that it is 
a useful immunogen. 

Example 10 

45 The following partial DNA sequence was identified in N. meningitidis <SEQ ID 77>: 

1 AT GCAAGCAC GGCTGCTGAT ACCTATTCTT TTTTCAGTTT TTATTTTATC 
51 CGC.TGCGGG ACACTGACAG GTATTCCATC GCATGGCGgA GkTAAACgCT 
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101 TTgCGGTCGA ACAAGAACTT GTGGCCGCTT CTGCCAGAGC TGCCGTTAAA 

151 GACATGGATT TACAGGCATT ACACGGACGA AAAGTTGCAT TGTACATTGC 

201 CACTATGGGC GACCAAGGTT CAGGcAGTTT GACAGGGGGG TCGCTACTCC 

251 ATTGATGCAC kGrTwCsTGG CGAATACATA AACAGCCCTG CCGTCCGTAC 

301 CGATTACACC TATCCACGTT ACGAAACCAC CGCTGAAACA ACATCAGGCG 

351 GTTTGACAGG TTTAACCACT TCTTTATCTA CACTTAATGC CCCTGCACTC 

4 01 TCTCGCACCC AATCAGACGG TAGCGGAAGT AAAAGCAGTC TGGGCTTAAA 

451 TATTGGCGGG ATGGGGGATT ATCGAAATGA AACCTTGACG ACTAACCCGC 

501 GCGACACTGC CTTTCTTTCC CACTTGGTAC AGACCGTATT TTTCCTGCGC 

551 GGCATAGACG TTGTTTCTCC TGCCAATGCC GATACAGATG TGTTTATTAA 

601 CAT CGACGT A TTCGGAACGA TACGCAACAG AACCGAAATG.. 

This corresponds to the amino acid sequence <SEQ ID 78; ORF15>: 



1 MQARLLIPIL FSV FILSAC G TLTGIPSHGG XKRFAVEQEL VAASARAAVK 

51 DMDLQALHGR KVALYIATMG DQGSGSLTGG RYSIDAXXXG EYINSPAVRT 

101 DYTYPRYETT AETTSGGLTG LTTSLSTLNA PALSRTQSDG SGSKSSLGLN 

151 IGGMGDYRNE TLTTNPRDTA FLSHLVQTVF FLRGIDWSP ANADTDVFIN 

201 IDVFGTIRNR TEM. . 

Further work revealed the complete nucleotide sequence <SEQ ID 79>: 

1 ATGCAAGCAC GGCTGCTGAT ACCTATTCTT TTTTCAGTTT TTATTTTATC 

51 CGCCTGCGGG ACACTGACAG GTATTCCATC GCATGGCGGA GGTAAACGCT 

101 TTGCGGTCGA ACAAGAACTT GTGGCCGCTT CTGCCAGAGC TGCCGTTAAA 

151 GACATGGATT TACAGGCATT ACACGGACGA AAAGTTGCAT TGTACATTGC 

201 CACTATGGGC GACCAAGGTT CAGGCAGTTT GACAGGGGGT CGCTACTCCA 

251 TTGATGCACT GATTCGTGGC GAATACATAA ACAGCCCTGC CGTCCGTACC 

301 GAT TACACCT ATCCACGTTA CGAAACCACC GCTGAAACAA CAT CAGGCGG 

351 TTTGACAGGT TTAACCACTT CTTTATCTAC ACTTAATGCC CCTGCACTCT 

401 CTCGCACCCA ATCAGACGGT AGCGGAAGTA AAAGCAGTCT GGGCTTAAAT 

451 ATTGGCGGGA TGGGGGATTA TCGAAATGAA ACCTTGACGA CTAACCCGCG 

501 CGACACTGCC TTTCTTTCCC ACTTGGTACA GACCGTATTT TTCCTGCGCG 

551 GCATAGACGT TGTTTCTCCT GCCAATGCCG ATACAGATGT GTTTATTAAC 

601 ATCGACGTAT TCGGAACGAT ACGCAACAGA ACCGAAATGC ACCTATACAA 

651 TGCCGAAACA CTGAAAGCCC AAACAAAACT GGAATATTTC GCAGTAGACA 

7 01 GAACCAATAA AAAATTGCTC ATCAAACCAA AAACCAATGC GTTTGAAGCT 

7 51 GCCTATAAAG AAAATTACGC ATTGTGGATG GGGCCGTATA AAGTAAGCAA 

801 AGGAAT T AAA CCGACGGAAG GATTAATGGT CGATTTCTCC GATATCCGAC 

851 CATACGGCAA TCATACGGGT AACTCCGCCC CATCCGTAGA GGCTGATAAC 

901 AGTCATGAGG GGTATGGATA CAGCGAT GAA GTAGTGCGAC AACATAGACA 

951 AGGACAACCT TGA 

This corresponds to the amino acid sequence <SEQ TD 80; ORF15-l>: 



1 MQARLLIPIL FSVFILSA CG TLTGIPSHGG GKRFAVEQEL VAASARAAVK 

51 DMDLQALHGR KVALYIATMG DQGSGSLTGG RYSIDALIRG EYINSPAVRT 

101 DYTYPRYETT AETTSGGLTG LTTSLSTLNA PALSRTQSDG SGSKSSLGLN 

151 IGGMGDYRNE TLTTNPRDTA FLSHLVQTVF FLRGIDWSP ANADTDVFIN 

2 01 IDVFGTIRNR TEMHLYNAET LKAQTKLEYF AVDRTNKKLL IKPKTNAFEA 

2 51 AYKENYALWM GPYKVSKGIK PTEGLMVDFS DIRPYGNHTG NSAPSVEADN 

301 SHEGYGYSDE VVRQHRQGQP * 

Further work identified the corresponding gene in strain A of TV. meningitidis <SEQ ID 81>: 



ATGCAAGCAC GGCTGCTGAT ACCTATTCTT TTTTCAGTTT TTATTTTATC 
CGCCTGCGGG ACACTGACAG GTATTCCATC GCATGGCGGA GGTAAACGCT 

TTGCGGTCGA ACAAGAACTT GTGGCCGCTT CTGCCAGAGC TGCCGTTAAA 

151 GACATGGATT TACAGGCATT ACACGGACGA AAAGTTGCAT TGTACATTGC 



101 



201 



^ UilIlinvninn nwnu^ 1 ^^ _l ^lOOblfi^O 

301 GATTACACCT ATCCACGTTA CGAAACCACC GCTGAAACAA CATCAGGCGG 

351 TTTGACAGGT TTAACCACTT CTTTATCTAC ACTTAATGCC CCTGCACTCT 

4 01 CGCGCACCCA ATCAGACGGT AGCGGAAGTA AAAGCAGTCT GGGCTTAAAT 

451 ATTGGCGGGA TGGGGGATTA TCGAAATGAA ACCTTGACGA CTAACCCGCG 

501 CGACACTGCC TTTCTTTCCC ACTTGGTACA GACCGTATTT TTCCTGCGCG 

551 GCATAGACGT TGTTTCTCCT GCCAATGCCG ATACGGATGT GTTTATTAAC 

601 ATCGACGTAT TCGGAACGAT ACGCAACAGA ACCGAAATGC ACCTATACAA 

651 TGCCGAAACA CTGAAAGCCC AAACAAAACT GGAATATTTC GCAGTAGACA 
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7 01 GAACCAATAA AAAATTGCTC AT CAAACCAA AAACCAATGC GTTTGAAGCT 

751 GCCTATAAAG AAAATTACGC ATTGTGGATG GGACCGTATA AAGTAAGCAA 

801 AGGAATTAAA CCGACAGAAG GATTAATGGT CGATTTCTCC GATATCCAAC 

851 CATACGGCAA TCATATGGGT AACTCTGCCC CATCCGTAGA GGCTGATAAC 

901 AGTCATGAGG GGTATGGATA CAGCGATGAA GCAGTGCGAC GACATAGACA 

951 AGGGCAACCT TGA 

This encodes a protein having amino acid sequence <SEQ ID 82; ORF15a>: 



1 MQARLLIPIL FSVFILSA CG TLTGIPSHGG GKRFAVEQEL VAASARAAVK 

51 DMDLQALHGR KVALY I ATMG DQGSGSLTGG RYSIDALIRG EYINSPAVRT 

101 DYTYPRYETT AETTSGGLTG LTTSLSTLNA PALSRTQSDG SGSKSSLGLN 

151 IGGMGDYRNE TLTTNPRDTA FLSHLVQTVF FLRGIDWSP ANADTDVFIN 

201 IDVFGTIRNR TEMHLYNAET LKAQTKLEYF AVDRTNKKLL IKPKTNAFEA 

251 AYKENYALWM GPYKVSKGIK PTEGLMVDFS DIQPYGNHMG NSAPSVEADN 

301 SHEGYGYSDE AVRRHRQGQP * 

The originally-identified partial strain B sequence (ORF15) shows 98.1% identity over a 213aa 
overlap with ORF15a: 



MQARLLIPILFSVFILSA CGTLTGIPSHGGXKRFAVEQELVAASARAAVKDMDLOALHGR 
I I M ! 1 1 M I I I I I I M I I I I I I I I II I I I I I [ f I I I I II M II II I I I I I II I I I I I i 
MQARLLIPILFSVFILSA CGTLTGIPSHGGGKRFAVEQELVAASARAAVKDMDLOALHGR 



10 



20 



30 



40 



50 



60 



70 80 90 100 110 120 

KVALYIATMGDQGSGSLTGGRYSIDAXXXGEYINSPAVRTDYTYPRYETTAETTSGGLTG 
I I I I I M I I I I I I M M I I II I I I I I I I I I I I I I I 11 II I I I I II I II I I I I I I I I I 
KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 

70 80 90 100 110 120 

130 140 150 160 170 180 

LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 
M I I I I I I M 1 ! 1 I I I I I I I I I I I I I I I I I I I I I I I I | I I | | | | | | | | | | | | | | | | | | | | 
LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 

130 140 150 160 170 180 



orfl5a FLRGIDVVSPANADTDVFINIDVFGTIRNRTEMHLYNAETLKAQTKLEYFAVDRTNKKLL 
190 200 210 220 230 240 

The complete strain B sequence (ORF15-1) and ORF15a show 98.8% identity in 320 aa overlap 



MQARLLIPILFSVFILSACGTLTGIPSHGGGKRFAVEQELVAASARAAVKDMDLQALHGR 
M I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I | | | | | | || M I I I II I I II I I I | I | 
MQARLLIPILFSVFILSACGTLTGIPSHGGGKRFAVSQELVAASARAAVKDMDLQALHGR 



10 



70 



20 



80 



30 



40 



50 



60 



90 100 110 120 

KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 
I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 1 I I I I I | | I I | | ! | ! | ( | | | | | | | | 
KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 
70 80 90 100 110 120 



130 



140 



150 



160 



170 



180 



orfl5a.pep LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 

I I I I I II I I I I I I I I I I I I I II I I I I I I 

orfl5-l LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 
130 140 150 160 170 180 

190 200 210 220 230 240 

orf 15a . pep FLRG I DVV S PANADTDVFINI DVFGT IRNRTEMHLYNAE T LKAQTKLE YFAVDRTNKKLL 

MINIMI 1 I I I I I I II II I I I I I I I I I I | 

o r f 1 5 - 1 FLRG I DVVS PANADT DVFIN I DVFGT IRNRTEMHLYNAE T LKAQTKLE YFAVDRTNKKLL 
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190 200 210 220 230 240 

250 260 270 280 290 300 

nrfl =, DeD IKPKTNAFEAAYKENYALWMGPYKVSKGIKPTEGLMVDFSDIQPYGNHMGNSAPSVEADN 
S |,| 1| || || II [|| II II II II Ml II II llll M II II II hllll I M II IN Ml I 

nT . f1 s _ IKPKTNAFEAAYKENYALWMGPYKVSKGIKPTEGLMVDFSDIRPYGNHTGNSAPSVEADN 

250 260 270 280 290 300 

310 320 
10 orf!5a.pep SHEGYGYSDEAVRRHRQGQPX 

I I I I I I I I M : I I = I I I i I 1 I 
orfl5-l SHEGYGYS DE WRQHRQGQPX 

310 320 

Further work identified the corresponding gene in N. gonorrhoeae <SEQ ID 83>: 

15 1 ATGCGGGCAC GGCTGCTGAT ACCTATTCTT TTTTCAGTTT TTATTTTATC 

51 CGCCTGCGGG AC AC T GAC AG GTATTCCATC GCATGGCGGA GGCAAACGCT 

101 TCGCGGTCGA ACAAGAACTT GTGGCCGCTT CTGCCAGAGC TGCCGTTAAA 

151 GACATGGATT TACAGGCATT ACACGGACGA AAAGTTGCAT TGTACATTGC 

2 01 AACTATGGGC GACCAAGGTT CAGGCAGTTT GACAGGGGGT CGCTACTCCA 

20 2 51 TTGATGCACT GATTCGCGGC GAATACATAA ACAGCCCTGC CGTCCGCACC 

301 GATTACACCT ATCCGCGTTA CGAAACCACC GCTGAAACAA CATCAGGCGG 

351 TTTGACGGGT TTAACCACTT CTTTATCTAC ACTTAATGCC CCTGCACTCT 

4 01 CGCGCACCCA ATCAGACGGT AGCGGAAGTA GGAGCAGTCT GGGCTTAAAT 

451 ATTGGCGGGA TGGGGGATTA TCGAAATGAA ACCTTGACGA CCAACCCGCG 

25 501 CGACACTGCC TTTCTTTCCC ACTTGGTGCA GACCGTATTT TTCCTGCGCG 

551 GCATAGACGT TGTTTCTCCT GCCAATGCCG AT AC AGAT GT GTTTATTAAC 

601 ATCGACGTAT TCGGAACGAT ACGCAACAGA ACCGAAATGC ACCTATACAA 

651 TGCCGAAACA CTGAAAGCCC AAACAAAACT GGAATATTTC GCAGTAGACA 

7 01 GAACCAATAA AAAATTGCTC ATCAAACCCA AAACCAATGC GTTTGAAGCT 

30 7 51 GCCTATAAAG AAAATTACGC ATTGTGGATG GGGCCGTATA AAGTAAGCAA 

801 AGGAAT CAAA CCGACGGAAG GATTGATGGT CGATTTCTCC GATATCCAAC 

851 CATACGGCAA TCATACGGGT AACTCCGCCC CATCCGTAGA GGCTGATAAC 

901 AGTCATGAGG GGTATGGATA CAGCGATGAA GCAGTGCGAC AACATAGACA 

951 AGGGCAACCT TGA 

35 This encodes a protein having amino acid sequence <SEQ ID 84; ORF15ng>: 

1 MRARLL I P I L FSVF ILSAC G TLTGIPSHGG GKRFAVEQEL VAASARAAVK 

51 DMDLQALHGR KVALYIATMG DQGSGSLTGG RYSIDALIRG EYINSPAVRT 

101 DYTYPRYETT AETTSGGLTG LTTSLSTLNA PALSRTQSDG SGSRSSLGLN 

151 IGGMGDYRNE TLTTN PRDTA FLSHLVQTVF FLRGIDWSP ANADTDVFIN 

40 201 IDVFGTIRNR TEMHLYNAET LKAQTKLEYF AVDRTNKKLL I KPKTNAFE A 

251 AYKENYALWM GPYKVSKGIK PTEGLMVDFS DIQPYGNHTG NSAPSVEADN 

301 SHEGYGYSDE AVRQHRQGQP * 

The originally-identified partial strain B sequence (ORF15) shows 97.2% identity over a 213aa 
overlap with ORF15ng: 

45 orfl5 pep MQARLLIPILFSVFILSACGTLTGIPSHGGXKRFAVEQELVAASARAAVKDMDLQALHGR 60 

I : | j I I I I II I M II II I I M I I I M II II I M I I I I I I I I I I I I I I I I I I M I II I II 
orf 15ng MRARLLIPILFSVFILSACGTLTGIPSHGGGKRFAVEQELVAA3ARAAVKDMDLQALHGR 60 

orfl5 pep KVALYIATMGDQGSGSLTGGRYSIDAXXXGEYINSPAVRTDYTYPRYETTAETTSGGLTG 120 
50 M II I I I I II II II II I I I II II II I M I I II I II II I I I I II II II II II II II M 

orfl5ng KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 12 0 

orf 15 -pep LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 18 0 
I I I I I I I I I I II II I I II II I II : \ M II I I II II I I II I I I I I I I I I 1 II II I I I I 1 I 1 
55 orfl5ng LTTSLSTLNAPALSRTQSDGSGSRSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 18 0 

orf 15. pep FLRG I DWS PANADT DVFINI DVFGT IRNRTEM 213 

I II II II II II II M I II 1 I II II I I I I I I 1 I I 
orf 15ng FLRGIDW3PANADTDVFINIDVFGTIRNRTEMHLYNAETLKAQTKLEYFAVDRTNKKLL 24 0 

60 The complete strain B sequence (ORF1 5-1) and ORF15ng show 98.8% identity in 320 aa overlap: 
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35 



orf 15-l.pe] 

orf 15ng 



orf 15-1 -pep 
orfl5ng 



rf 15-1. pep 
rfl5ng 



rfl5-l.pep 
rfl5ng 



orf 15-1 .pep 
orfl5ng 



rf 15-1. pep 

rfl5ng 



MQARLLIPILFSVFILSACGTLTGIPSHGGGKRFAVEQELVAASARAAVKDMDLQALHGR 

|:MII1IIIMMIIIMM1MMIIIMIIIIMI111IIIII1IIIIIIIMIIM 
MRARLLIPILFSVFILSACGTLTGIPSHGGGKRFAVEQELVAASARAAVKDMDLQALHGR 
10 20 30 40 50 60 

70 80 90 100 110 120 

KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 

| | M M II I I I I I ! I I I I M I I I I II I I I I I I I I I I ! I I I I I I I H I 1 M I 1 1 I I I I I I I 
KVALYIATMGDQGSGSLTGGRYSIDALIRGEYINSPAVRTDYTYPRYETTAETTSGGLTG 
70 80 90 100 110 120 

130 140 150 160 170 180 

LTTSLSTLNAPALSRTQSDGSGSKSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 
| | | I | | | M I I I I I II I I I I I I I : I M I I I I I I I I I I I I I I I I I I I I M II I I I I I I I I I 
LTTSLSTLNAPALSRTQSDGSGSRSSLGLNIGGMGDYRNETLTTNPRDTAFLSHLVQTVF 

130 140 150 160 170 180 

190 200 210 220 230 240 

FLRGIDWSPANADTDVFINIDVFGTIRNRTEMHLYNAETLKAQTKLEYFAVDRTNKKLL 

| | | | M I II I I I I I I I I I I I I I I I M II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I M 
FLRGIDWSPANADTDVFINIDVFGTIRNRTEMHLYNAETLKAQTKLEYFAVDRTNKKLL 
190 200 210 220 230 240 

250 260 270 280 290 300 

IKPKTNAFEAAYKENYALWMGPYKVSKGIKPTEGLMVDFSDIRPYGNHTGNSAPSVEADN 
I I I I I I I I 1 I I I I I I I I I I I II I I I I I I I I I I I I I I I I M I! : I II I I I I I I I I M I I I I 
IKPKTNAFEAAYKENYALWMGP YKVS KG IKPTEGLMVDFS D IQPYGNHTGN S APSVEADN 
250 260 270 280 290 300 

310 320 
SHEGYGYSDEWRQHRQGQPX 
I I I I I I I I II = I I I I I I I I I I 
SHEGYGYSDEAVRQHRQGQPX 

310 320 



Computer analysis of these amino acid sequences reveals an ILSAC motif (putative membrane 
lipoprotein lipid attachment site, as predicted by the MOTIFS program). 

indicates a putative leader sequence, and it was predicted that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

ORF15-1 (31.7kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
4A shows the results of affinity purification of the GST-fusion protein, and Figure 4B shows the 
results of expression of the His-fusion in E.coli. Purified GST-fusion protein was used to immunise 
mice, whose sera were used for Western blot (Figure 4C) and ELISA (positive result). These 
experiments confirm that ORFX-1 is a surface-exposed protein, and that it is a useful immunogen. 

Example 11 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 85>: 
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101 ATCCCCGCGT TCGGGCTTCA AATTTTCTTC ATCCTGTTTT TAACCGCCGT 

151 CGCATTCAAA ACACTGCATA CCGACCCTCA GACGGCATCC CGCCCGCTGC 

201 CCGGACTGCC CrGACTGACT GCGGTTTCCA CACTGTTCGG CACAATGTCG 

251 AGCTGGGTCG GCATAGGCGG CGGTTCACTT TCCGTCCCCT TCTTAATCCA 

301 CTGCGGCTTC CCCGCCCATA AAGCCATCGG CACATCATCC GGCCTTGCCT 

351 GGCCGATTGC ACTCTCCGGC GCAATATCGT ATCTGCTCAA CGGCCTGAAT 

4 01 ATTGCAGGAT TGCCCGAAGG GTCACTGGGC TTCCTTTACC TGCCCGCCGT 

4 51 CGCCGTCCTC AGCGCGGCAA CCATTGCCTT TGCCCCGCTC GGTGTCAAAA 

501 CCGCCCACAA ACTTTCTTCT GCCAAACTCA AAAAATC.TT CGGCATTATG 

551 TTGCTTTTGA TTGCCGGAAA AATGCTGTAC AACCTGCTTT AA 

This corresponds to the amino acid sequence <SEQ ID 86; ORF17>: 

1 . . GQHKKQAVNG KTVFTMMPGM IFGVFTGAFS AKYIPAFGLQ IFFILFLTAV 

51 AFKTLHTDPQ TASRPLPGLP XLTAVSTLFG TMSSWVGIGG GSLSVPFLIH 

101 CGFPAHKAIG TSSGLAWPIA LSGAISYLLN GLNIAGLPEG SLGFLYLPAV 

151 AVLSAATIAF APLGVKTAHK LSSAKLKKSF GIMLLLIAGK MLYNLL* 

Further work revealed the complete nucleotide sequence <SEQ ID 87>: 

1 AT GT GGCATT GGGAC AT TAT CTTAATCCTG CTTGCCGTAG GCAGTGCGGC 

51 AGGTTTTATT GCCGGCCTGT TCGGCGTAGG CGGCGGCACG CTGATTGTCC 

101 CTGTCGTTTT ATGGGTGCTT GATTTGCAGG GTTTGGCACA ACATCCTTAC 

151 GCGCAACACC TCGCCGTCGG CACATCCTTC GCCGTCATGG TCTTCACCGC 

2 01 CTTTTCCAGT ATGCTGGGGC AGCACAAAAA ACAGGCGGTC GACTGGAAAA 

2 51 CCGTATTTAC GATGATGCCG GGTATGATAT TCGGCGTATT CACGGGCGCA 

3 01 CTCTCCGCAA AATATATCCC CGCGTTCGGG CTTCAAATTT TCTTCATCCT 
351 GTTTTTAACC GCCGTCGCAT TCAAAACACT GCATACCGAC CCTCAGACGG 

4 01 CATCCCGCCC GCTGCCCGGA CTGCCCGGAC TGACTGCGGT TTCCACACTG 
4 51 TTCGGCACAA TGTCGAGCTG GGTCGGCATA GGCGGCGGTT CACTTTCCGT 
501 CCCCTTCTTA ATCCACTGCG GCTTCCCCGC CCATAAAGCC ATCGGCACAT 
551 CATCCGGCCT TGCCTGGCCG ATTGCACTCT CCGGCGCAAT ATCGTATCTG 
601 CTCAACGGCC TGAATATTGC AGGATTGCCC GAAGGGTCAC TGGGCTTCCT 
651 TTACCTGCCC GCCGTCGCCG TCCTCAGCGC GGCAACCATT GCCTTTGCCC 
7 01 CGCTCGGTGT CAAAACCGCC CACAAACTTT CTTCTGCCAA ACTCAAAAAA 

7 51 Tc . TTCGGCA TTATGTTGCT TTTGATTGCC GGAAAAATGC TGTACAACCT 

8 01 GCTTTAA 

This corresponds to the amino acid sequence <SEQ ID 88; ORF17-l>: 

1 MWHWDIILIL LAVGSAAGF I AG LFGVGGGT LIVPVVLWV L DLQGLAQHPY 

51 AQHLA VGTSF AVMVFTAFSS ML GQHKKQAV DWKT VFTMMP GMIFGVFTGA 

101 LSAKYIP AFG LQIFFILFLT AVAF KTLHTD PQTASRPLPG LPGLTAVSTL 

151 FGTMSSWVGI GGGSLSVPFL IHCGFPAHKA IGTSSGLAWP IALSGAISYL 

201 LNGLNIAGLP EGSLGFLYLP AVAVLSAATI AFAPLGV KTA HKLSSAKLKK 

251 X FGIMLLLIA GKMLYNLL * 

Computer analysis of this amino acid sequence gave the following results: 

Homology with hypothetical H. influenzae transmembrane protein HI0902 (accession number P44070) 
ORF17 and HI0902 proteins show 28% aa identity in 192 aa overlap: 



ORF17 


3 


HKKQAVNGKTVFTMMPGMIFGVFT-GAFSAKYIPAFGLQIF — FILFLTAVAFKTLHTDP 


59 






HK + + V + P ++ VF G F + +IF +++L ++ D 




HI0902 


72 


HKLGNIVWQAVRILAPVIMLSVFICGLFIGRLDREISAKIFACLWYLATKMVLSIKKD- 


130 


ORF17 


60 


QTASRPLPGLPXLTAVSTLFGTMSSWVGIGGGSLSVPFLIHCGFPAHKAIGTSSGLAWPI 


119 






Q ++ L L + L G SS GIGGG VPFL G +AIG+S+ + 




HI0902 


131 


QVTTKSLTPLSSVIG-GILIGMASSAAGIGGGGFIVPFLTARGINIKQAIGSSAFCGMLL 


189 


ORF17 


120 


ALSGAISYLLNGLNIAGLPEGSLGFLYLPAVAVLSAATIAFAPLGVXXXXXXXXXXXXXX 


179 






+SG S++++G +PE SLG++YLPAV ++A + + LG 




HI0902 


190 


GISGMFSFIVSGWGNPLMPEYSLGYIYLPAVLGITATSFFTSKLGASATAKLPVSTLKKG 


249 


ORF17 


180 


FGIMLLLIAGKM 191 








F + L+++A M 




HI0902 


250 


FALFLIWAINM 261 
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Homology with a predicted QRF from N. meningitidis (strain A) 

ORF17 shows 96.9% identity over a 196aa overlap with an ORF (ORF17a) from strain A of AT. 



meningitidis: 



GOHKKQAVNGKT VFTMMPGMIFGVFTGA FS 
I I I I I I I I : I I I I I M I M : M II : I I : I 



40 50 60 70 80 90 

orfl7 t>eiD AKYIPAFGLQIFFILFL TAVAF KTLHTDPQTASRPLPGLPXLTAVSTLFGTMSSWVGIGG 
| | | | | | | M | | | I I M I I I I I I I II I I I I I 1 II I I I I I I 1 I I I I I I I I I I I I I I I I I I I 
orfl7a AKYIPAF GLQIFFILFLTAVAF KTLHTDPQTASRPLPGLPGLTAVSTLFGTMSSWVGIGG 
U0 120 130 140 150 160 

100 110 120 130 140 150 

orfl7 pep GSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAV 

I | | | | | | | | | | II I I II I I I I I I I I I II M I I I I I I I I I I I I I I I M M M M M I I I I I 
or f 17a GSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAV 
170 180 190 200 210 220 

160 170 180 190 

orfl7 pep A VLSAATIAFAPLGV KTAHKLSSAKLKKS FGIMLLLIAGKMLYNLL X 
| || I II II I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I 
orf 17a AVLSAATIAFAPLGV KTAHKLSSAKLKKS FGIMLLLIAGKMLYNLL X 
230 240 250 260 

The complete length ORF 17a nucleotide sequence <SEQ ID 89> is: 

1 ATGTGGCATT GGGACATTAT CTTAATCCTG CTTGCCGTAG GCAGTGCGGC 

51 AGGTTTTATT GCCGGCCTGT TCGGCGTAGG CGGCGGCACG CTGATTGTCC 

101 CTGTCGTTTT ATGGGTGCTT GATTTGCAGG GTTTGGCACA ACATCCTTAC 

151 GCGCAACACC TCGCCGTCGG CACATCCTTC GCCGTCATGG TCTTCACCGC 

201 CTTTTCCAGT ATGCTGGGGC AGCACAAAAA ACAGGCGGTC GACTGGAAAA 

251 CCGTATTTAC GATGATGCCG GGTATGGTAT TCGGCGTATT CGCTGGCGCA 

301 CTCTCCGCAA AATATATCCC AGCGTTCGGG CTTCAAATTT TCTTCATCCT 

351 GTTTTTAACC GCCGTCGCAT TCAAAACACT GCATACCGAC CCTCAGACGG 

401 CATCCCGCCC GCTGCCCGGA CTGCCCGGAC TGACTGCGGT TTCCACACTG 

451 TTCGGCACAA TGTCGAGCTG GGTCGGCATA GGCGGCGGTT CACTTTCCGT 

501 CCCCTTCTTA ATCCACTGCG GCTTCCCCGC CCATAAAGCC ATCGGCACAT 

551 CATCCGGCCT TGCCTGGCCG ATTGCACTCT CCGGCGCAAT ATCGTATCTG 

601 CTCAACGGCC TGAATATTGC AGGATTGCCC GAAGGGTCAC TGGGCTTCCT 

651 TTACCTGCCC GCCGTCGCCG TCCTCAGCGC GGCAACCATT GCCTTTGCCC 

7 01 CGCTCGGTGT CAAAACCGCC CACAAACTTT CTTCTGCCAA ACT CAAAAAA 

7 51 TCCTTCGGCA TTATGTTGCT TTTGATTGCC GGAAAAATGC TGTACAACCT 

801 GCTTTAA 

This encodes a protein having amino acid sequence <SEQ ID 90>: 

1 MWHWDIILIL LAVGSAAGF I AG LFGVGGGT LIVPVVLWV L DLQGLAQHPY 

51 AQHL AVGTSF AVMVFTAFSS ML GQHKKQAV DWKT VFTMMP GMVFGVFAGA 

101 LSAKYIP AFG LQIFFILFLT AVAF KTLHTD PQTASRPLPG LPGLTAVSTL 

151 FGTMSSWVGI GGGSLSVPFL IHCGFPAHKA IGTSSGLAWP IALSGAISYL 

201 LNGLNIAGLP EGSLGFLYLP AVAVLSAATI AFAPLGV KTA HKLSSAKLKK 

251 S FGIMLLLIA GKMLYNLL * 

ORF 17a and ORF 17-1 show 98.9% identity in 268 aa overlap: 



MWHWDIILILLAVGSAAGFIAGLFGVGGGTLIVPWLWVLDLQGLAQHPYAQHLAVGTSF 

I I I I I I I I I II I I I II I II I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I 

MWHWDIILILLAVGSAAGFIAGLFGVGGGTLIVPWLWVLDLQGLAQHPYAQHLAVGTSF 



70 80 90 100 110 120 

AVMVFTAFSSMLGQHKKQAVDWKTVFTMMPGMVFGVFAGALSAKYIPAFGLQIFFILFLT 
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I I | | I | I I I I I ! I I I I I I I I i I I I I I I I I I 1 I : i I I I : I I I I I I I I I I I I I I I I I I I I I I 
or f 17-1 AVMVFTAFSSMLGQHKKQAVDWKTVFTMMPGMIFGVFTGALSAKYIPAFGLQIFFILFLT 
70 80 90 100 110 120 



130 140 150 160 170 180 

orfl7a.pep AVAFKTLHTDPQTASRPLPGLPGLTAVSTLFGTMSSWVGIGGGSLSVPFLIHCGFPAHKA 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I [ i I 1 I I I 
orfl7-l AVAFKT1HTDPQTASRPLPGLPGLTAVSTLFGTMSSWVGIGGGSLSVPFLIHCGFPAHKA 

130 140 150 160 170 ■ 180 



190 200 210 220 230 240 

orfl7a.pep IGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAVAVLSAATIAFAPLGVKTA 
! I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I i I i I I I I I I I I I 
orf 17-1 IGTSSGLAWPIALSGAISYLLNGLNIAGLPEGSLGFLYLPAVAVLSAATIAFAPLGVKTA 

190 200 210 220 230 240 



250 260 269 

orfl7a.pep HKLS SAKLKKS FGIMLLL IAGKMLYNLLX 

I I I I I I I I M I I I I i II I I I M I I I I I ! 
o r f 1 7 - 1 HKLS SAKLKKX FG IMLLLI AGKML YNLLX 

250 260 



Homology with a predicted ORF from N. gonorrhoeae 

ORF 17 shows 93.9% identity over a 196aa overlap with a predicted ORF (ORF17.ng) from N. 
gonorrhoeae: 



orf 17. pep GQHKKQAVNGKTVFTMMPGMIFGVFTGAFS 30 

11111111= I I = I = M I I I I I I I I : I'l : I 
orfl7ng QGLAQHPYAQHLAVGTSFAVMVFTAFSSMLGQHKKQAVDWKTIFAMMPGMIFGVFAGALS 102 

orf 17 .pep AKYIPAFGLQIFFILFLTAVAFKTLHTDPQTASRPLPGLPXLTAVSTLFGTMSSWVGIGG 90 

M M M M 1 I I I I I I I I I I 1 M I I I I I I I I I I I I I I I I I I 1 I I I I I I : I I I I I ] I I I 

orfl7ng AKYIPAFGLQIFFILFLTAVAFKTLHTGRQTASRPLPGLPGLTAVSTLFGAMSSWVGIGG 162 

orf 17 .pep GSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLLNGLNIAGL PEGSLGFLYLPAV 150 

I I I I I M I I I I I I I I I I I II I I II I I I I I I I M I I I 1 I : ! I I I I I I I I I I I I I I I I | | M 
orfl7ng GSLSVPFLIHCGFPAHKAIGTSSGLAWP I ALSGAISYLVNGLNIAGL PEGSLGFLYLPAV 202 

orfl7.pep AVLSAATIAFAPLGVKTAHKLS SAKLKKS FGIMLLLIAGKMLYNLL 196 

I I I I M I I i I I ! I I I I I II I I I II I I I : I I I I I I I II I I I I M II I 
orfl7ng AVLSAAT IAFAPLGVKTAHKL S SAKLKE S FGIMLLL I AGKML YNLL 2 68 

An ORF17ng nucleotide sequence <SEQ ID 91> is predicted to encode a protein having amino acid 
sequence <SEQ ID 92>: 



1 MWHWDIILIL LAVGSAAGFI AGLFGVGGGT LIVPVVLWVL DLQGLAQHPY 

51 AQHLAVGTSF AVMVFTAFSS MLGQHKKQAV D WKT I FAMMP GMIFGVFAGA 

101 LSAKYIPAFG LQIFFILFLT AVAFKTLHTG RQTASRPLPG LPGLTAVSTL 

151 FGAMSSWVGI GGGSLSVPFL IHCGFPAHKA IGTSSGLAWP IALSGAISYL 

201 VNGLNIAGLP EGSLGFLYLP AVAVLSAATI AFAPLGVKTA HKLS SAKLKE 

251 SFGIMLLLIA GKMLYNLL* 

Further work revealed the complete gonococcal DNA sequence <SEQ ID 93>: 

1 ATGTGGCATT GGGACATTAT CTTAATCCTG CTTGCcgtag gcAGTGCGGC 

51 AGGTTTTATT GCCGGCCTGT Tcggtgtagg cggcgGTACG CTGATTGTCC 

101 CTGTCGTTTT ATGGGTGCTT GATTTGCAGG GTTTGGCACA ACATCCTTAC 

151 GCGCAACACC TCGCCGTCGG CAcaTccttc gcCGTCATGG TCTTCACCGC 

201 CTTTTCCAGT ATGTTGGGGC AGCACAAAAA ACAGGCGGTC GACTGGAAAA 

251 CCATATTTGC GATGATGCCG GGTATGATAT TCGGCGTATT CGCTGGCGCA 

301 CTCTCCGCAA AATATATCCC CGCGTTCGGG CTTCAAATTT TCTTCATCCT 

351 GTTTTTAACC GCCGTCGCAT TCAAAACACT GCATACCGGT CGTCAGACGG 

401 CATCCCGCCC GCTGCCCGGG CTGCCCGGAC TGACTGCGGT TTCCACACTG 

451 TTCGGCGCAA TGTCGAGCTG GGTCGGCATA GGCGGCGGTT CACTTTCCGT 

501 CCCCTTCTTA ATCCACTGCG GCTTCCCCGC CCATAAAGCC ATCGGCACAT 
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551 CATCCGGCCT TGCCTGGCCG ATTGCACTCT CCGGCGCAAT ATCGTATCTG 

601 GTCAACGGTC TGAATATTGC AGGATTGCCC GAAGGGTCGC TGGGCTTCCT 

651 TTACCTGCCC GCCGTCGCCG TCCTCAGCGC GGCAACCATT GCCTTTGCCC 

701 CGCTCGGTGT CAAAACCGCC CACAAACTTT CTTCTGCCAA ACT CAAAGAA 

751 TCCTTCGGCA TTATGTTGCT TTTGATTGCC GGAAAAATGC TGTACAACCT 

801 GCTTTAA 

This corresponds to the amino acid sequence <SEQ ID 94; ORF17ng-l>: 

1 MWHWDIILIL LAVGSAAGF I AG LFGVGGGT LIVPWLWV L DLQGLAQHPY 

51 AOHL AVGTSF AVMVFTAFSS ML GQHKKQAV DWKT IFAMMP GMIFGVFAGA 

101 LSAKYIP AFG LQIFFILFLT AVAF KTLHTG RQTASRPLPG LPGLTAVSTL 

151 FGAMSSWVGI GGGSLSVPFL IHCGFPAHKA IGTSSGLAWP IALSGAISYL 

201 VNGLNIAGLP EGSLGFLYLP AVAVLSAATI AFAPLGV KTA HKLS SAKLKE 

251 S FGIMLLLIA GKMLYNLL * 

ORF17ng-l and ORF17-1 show 96.6% identity in 268 aa overlap: 

10 20 30 40 50 60 

orfl7-l pep MWHWDIILILLAVGSAAGFIAGLFGVGGGTLIVPVVLWVLDLQGLAQHPYAQHLAVGTSF 

| I I I I I I I I ! I I I I I I 1 I I I! I I I I I ! I 11 I I I I I I I ! I I I I I M M I I I I I I I I I I I I I 

orfl7ng-l MWHWDIILILLAVGSAAGFIAGLFGVGGGTLIVPWLWVLDLQGLAQHPYAQHLAVGTSF 



70 80 90 100 110 120 

orfl7-l pep AVMVFTAFSSMLGQHKKQAVDWKTVFTMMPGMIFGVFTGALSAKYIPAFGLQIFFILFLT 
M I I I I I I I I M I I I I I I 1 I I 1 I I : I : I I ! I I I I I I I : I I I I I I I I I I ! I ! I I I I I I M I 
orfl7ng-l AVMVFTAFSSMLGQHKKQAVDWKTIFAMMPGMIFGVFAGALSAKYIPAFGLQIFFILFLT 

70 80 90 100 110 120 

130 140 150 160 170 180 

orfl7-l pep AVAFKTLHTDPQTASRPLPGLPGLTAVSTLFGTMSSWVGIGGGSLSVPFLIHCGFPAHKA 
I I I I I I I I I I I I I I I I I I I I M I I I I I I I I = I I I I I I 1 I 1 I I I I 1 i I I I I I I I I I I I I 
orf 17ng-l AVAFKTLHTGRQTASRPLPGLPGLTAVSTLFGAMSSWVGIGGGSLSVPFLIHCGFPAHKA 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 17-1 pep I GTSSGLAW PI ALSGAISYLLNGLNIAGLPEGSLGFLYLPAVAVLSAAT I AFAPLGVKTA 

| | i | M M I I I I I I I I II I I : I I I I I I I i I I I I I I I I I I I I I I I I ! I I I I I I I I I 1 I ! I I 
orfl7ng-l IGTSSGLAWPIALSGAISYLVNGLNIAGLPEGSLGFLYLPAVAVLSAATIAFAPLGVKTA 

190 200 210 220 230 240 



250 260 269 

orf 17-1 .pep HKLSSAKLKKXFGIMLLLIAGKMLYNLLX 

I I I I I I I I I : I I I I I I I I I I M I I I I I I 
orfl7ng-l HKL S SAKLKE S FG IMLL L I AGKMLYNLLX 

250 260 

In addition, ORF17ng-l shows significant homology with a hypothetical H.influenzae protein: 

sp|P44070 |Y902_HAEIN HYPOTHETICAL PROTEIN HI0902 pir||G64015 hypothetical protein 
HI0902 - Haemophilus influenzae (strain Rd KW20) gi 11573922 (U32772) H. influenzae 
predicted coding region HI0902 [Haemophilus influenzae] Length = 264 

Score = 74 (34.9 bits), Expect = 1.6e-23, Sum P(2) = 1 . 6e-23 

Identities = 15/43 (34%), Positives = 23/43 (53%) 

Query: 55 AVGT S FAVMVFTAFS SMLGQHKKQAVDWKT I FAMMPGMI FGVF 97 

A+GTSFA +V T S HK + W+ + + P ++ VF 
Sbjct: 52 ALGTSFATIVITGIGSAQRHHKLGNIVWQAVRILAPVIMLSVF 94 

Score = 195 (91.9 bits), Expect = 1.6e-23, Sum P(2) = 1.6e-23 
Identities = 44/114 (38%), Positives = 65/114 (57%) 

Query: 150 LFGAMSSWVGIGGGSLSVPFLIHCGFPAHKAIGTSSGLAWPIALSGAISYLVNGLNIAGL 209 

L G SS GIGGG VPFL G +AIG+S+ + +SG S++V+G + 

Sbjct: 148 LIGMASSAAGIGGGGFIVPFLTARGINIKQAIGSSAFCGMLLGISGMFSFIVSGWGNPLM 207 

Query: 210 PEGS LG FLYL PAVAVLS AAT I AFAPLGVKTAHKLS SAKLKES FGIMLLL I AGKM 263 
PE SLG++YLPAV ++A + + LG KL + LK+ F + L+++A M 
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Sbjct: 208 PEYSLGYIYLPAVLGITATSFFTSKLGASATAKLPVSTLKKGFALFLIWAINM 261 

This analysis, including the homology with the hypothetical H.influenzae transmembrane protein, 
5 suggests that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 12 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 95>: 

1 . . GGAAACGGAT GGCAGGCAGA CCCCGAACAT CCGCTGCTCG GGCTTTTTGC 

IQ 51 CGTCAGTAAT GTATCGATGA CGCTTGCTTT TGTCGGAATA TGTGCGTTGG 

101 TGCATTATTG CTTTTCGGGA ACGGTTCAAG TGTTTGTGTT TGCGGCACTG 

151 CTCAAACTTT ATGCGCTGAA GCCGGTTTAT TGGTTCGTGT TGCAGTTTGT 

201 GCTGATGGCG GTTGCCTATG TCCACCGCTG CGGTATAGAC CGGCAGCCGC 

251 CGTCAACGTT CGGCGGCTCG CAGCTGCGAC TCGGCGGGTT GACGGCAGCG 

15 301 TTGATGCAGG TCTCGGTACT GGTGCTGCTG CTTTCAGAAA TTGGAAGATA 

351 A 

This corresponds to the amino acid sequence <SEQ ID 96; ORF18>: 

1 . . GNGWQADPEH PLLGLFAVSN VSMT LAFVG I CALVHYCFSG TVQVFVFAAL 
51 LKLYALKPVY WFVLQFVLMA VAYVHRCGID RQPPSTFGGS QLRLGGLTAA 
20 101 LMQVSVLVLL LSEIGR* 

Further work revealed the complete nucleotide sequence <SEQ ID 97>: 

1 ATGATTTTGC TGCATTTGGA TTTTTTGTCT GCCTTACTGT ATGCGGCGGT 

51 TTTTCTGTTT CTGATATTCC GCGCAGGAAT GTTGCAATGG TTTTGGGCGA 

101 GTATTATGCT GTGGCTGGGC ATATCGGTTT TGGGGGCAAA GCTGATGCCC 

25 151 GGCATATGGG GAATGACCCG CGCCGCGCCC TTGTTCATCC CCCATTTTTA 

201 CCTGACTTTG GGCAGCATAT TTTTTTTCAT CGGGCATTGG AACCGGAAAA 

251 CAGATGGAAA CGGATGGCAG GCAGACCCCG AACATCCGCT GCTCGGGCTT 

301 TTTGCCGTCA GTAATGTATC GATGACGCTT GCTTTTGTCG GAATATGTGC 

351 GTTGGTGCAT TATTGCTTTT CGGGAACGGT TCAAGTGTTT GTGTTTGCGG 

30 401 CACTGCTCAA ACTTTATGCG CTGAAGCCGG TTTATTGGTT CGTGTTGCAG 

451 TTTGTGCTGA TGGCGGTTGC CTATGTCCAC CGCTGCGGTA TAGACCGGCA 

501 GCCGCCGTCA ACGTTCGGCG GCTCGCAGCT GCGACTCGGC GGGTTGACGG 

551 CAGCGTTGAT GCAGGTCTCG GTACTGGTGC TGCTGCTTTC AGAAATTGGA 

601 AGATAA 

35 This corresponds to the amino acid sequence <SEQ ID 98; ORF1 8-l>: 

1 MILLHLDFLS ALLYAAVFLF LIFRAGMLQW FWASIMLWLG I SVLGAKLMP 

51 GIWGMTRA AP LFIPHFYLTL GSIFFFI GHW NRKTDGNGWQ ADPEHPLLGL 

101 F AVSNVSMTL AFVGICALV H Y CFSGTVQVF VFAALLKL YA LKPVYWFVLQ 

151 FVLMAVAYV H RCGIDRQPPS TFGGSQLRLG GLTAALMQVS VLVLLLS EIG 

40 201 R* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF18 shows 98.3% identity over a 116aa overlap with an ORF (ORF18a) from strain A ofN. 
meningitidis: 

45 10 20 30 

orfl8.pep GNGWQADPEHPLLGLF AVSNVSMTLAFVGI 

II I I I 1 I 1 I I I I I I I I I I I 1 I I I I I I II 1 I 
orfl8a TRAAP LFIPHFYLTLGSIFFFI GHWMRKTDGNGWQADPEHPLLGLF AVSNVSMTLAFVGI 
60 70 80 90 100 110 
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40 50 60 70 80 90 

orfl8 pep raT.VHYnF.SGTVnVFVFAALLKLYALK PVYWFVLQFVLMAVAYV HRCGIDRQPPSTFGGS 
TTTTl I I I I I I I I I I I I I I M I 1 I ! I I I I I I I I I I I I I I M I I I I 1 I I I I I I I I I I I I I 
5 orfl8a CALVHY CFSXTVQVFVFAALLKL YALK PVYWFVLQFVLMAVAYV HRCGIDRQPPSTFGGS 

— 120 130 T40 150 160 170 

100 110 
orfl8.pep QLRLG GLTAALMQVSVLVLLLS EIGRX 
10 I I I I I I M II I I I I I I I I I I I M I M 

o r f 1 8 a OLRLG GLTAALMQXSVLVLLLS E I GRX 

180 190 200 

The complete length ORF18a nucleotide sequence <SEQ ID 99> is: 

1 ATGATTTTGC TGCATTTGGA TTTTTTGTCT GCCTTACTGT ATGCGGCGGT 

15 51 TTTTCTGTTT CTGATATTCC GCGCAGGAAT GTTGCAATGG TTTTGGGCGA 

101 GTATTATGCT GTGGCTGGGC ATATCGGTTT TGGGGGCAAA GCTGATGCCC 

151 GGCATATGGG GAATGACCCG CGCCGCGCCC TTGTTCATCC CCCATTTTTA 

201 CCTGACTTTG GGCAGCATAT TTTTTTTCAT CGGGCATTGG AACCGGAAAA 

251 CGGATGGAAA CGGATGGCAG GCAGACCCCG AACATCCTCT GCTCGGGCTG 

20 301 TTTGCCGTCA GTAATGTATC GATGACGCTT GCTTTTGTCG GAATATGTGC 

351 GTTGGTGCAT TATTGCTTTT CGNGAACGGT TCAAGTGTTT GTGTTTGCGG 

4 01 CACTGCTCAA ACTTTATGCG CTGAAGCCGG TTTATTGGTT CGTGTTGCAG 

451 TTTGTGCTGA TGGCGGTTGC CTATGTCCAC CGCTGCGGTA TAGACCGGCA 

501 GCCGCCGTCA ACGTT CGGCG GNTCGCAGCT GCGACTCGGC GGGTTGACGG 

25 551 CAGCGTTGAT GCAGNTCTCG GTACTGGTGC TGCTGCTTTC AGAAATTGGA 

601 AGATAA 

This encodes a protein having amino acid sequence <SEQ ED 100>: 

1 MILLHLDFLS ALLYAAVFLF L I FRAGMLQW FWASIMLWLG ISVLGAKLMP 
51 GIWGMTRAA P LFIPHFYLTL GSIFFFI GHW NRKT DGNGWQ ADPEHPLLGL 
30 101 F AVSNVSMTL AFVGICALV H Y CFSXTVQVF VFAALLKL YA LK PVYWFVLQ 

151 FVLMAVAYV H RCGIDRQPPS TFGGSQLRLG GLTAALMQXS VLVLLLS EIG 
201 R* 

ORF18a and ORF18-1 show 99.0% identity in 201 aa overlap: 

10 20 30 40 50 60 

35 orf 18a . pep MILLHLDFLSALLYAAVFLFLIFRAGMLQWFWASIMLWLGISVLGAKLMPGIWGMTRAAP 

I M M I I I I I I I I i II I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I II I I 
orf 18-1 MILLHLDFLSALLYAAVFLFLIFRAGMLQWFWASIMLWLGISVLGAKLMPGIWGMTRAAP 
10 20 30 40 50 60 

40 70 80 90 100 110 120 

orf 18a . pep LFIPHFYLTLGSIFFFIGHWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGICALVH 
I I I i II I I I I I I I I 1 I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I M I I I I I I 1 
orf 18-1 LFIPHFYLTLGSIFFFIGHWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGICALVH 
70 80 90 100 110 120 

45 

130 140 150 160 170 180 

orf 18a . pep YCFSXTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGSQLRLG 
I I I I I I I I I I I II I I I I I I I I I I I M I I I I I 1 I I I I I I I I I I I I M I II I I I I II I I I I 
orf 18-1 YCFSGTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGSQLRLG 
50 130 140 150 160 170 180 

190 200 
orf 18a. pep GLTAALMQXSVLVLLLS E I GRX 



Homology with a predicted ORF from N. gonorrhoeae 

ORF 18 shows 93.1% identity over a 116aa overlap with a predicted ORF (ORF18.ng) from N. 
60 gonorrhoeae: 
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o „„ n GNGWQADPEHPLLGLFAVSNVSMTLAFVGI 30 

P P M I I I ! I I I I I I I I I I I I I i I I I I I 

orfl8ng T raaPLFIPHFYLTLGSIFFFIGYWNRKTDGNGWQADPEHPLLGLFAVSTs1VSMTLAFVGI 115 

orfl8 Deo CALVHYCFSGTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGS 90 

I I | I I I I I I I I M I I I I I I I [ I I I I I I I I I I I I I I 1 I I I I I I I I I I I M M M I I I I I N 

orf 18ng CALVHYCFSGTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGS 17 5 

orfl8 pep QLRLGGLT AALMQVSVLVLLLSE I GR 116 

Mill 1:1 1111:1 ::||:llll 

orfl8ng QLRLGVLAAMLMQVAVTAMLLAEIGR 201 

The complete length ORF18ng nucleotide sequence is <SEQ ID 101>: 

1 ATGATTTTGC TGCATTTGGA TTTTTTGTCT GCCTTACTGt aTGCGGcggt 

51 tttTctgTTT CTGATATTCC GCGCAGGAAT GTTGCAATGG TTTTGGGCGA 

101 GTATTGCGTT GTGGCTCGGC ATCTCGGTTT TAGGGGTAAA GCTGATGCCG 

151 GGGATGTGGG GAATGACCCG CGCCGCGCCT TTGTTCATCC CCCATTTTTA 

201 CCTGACTTTG GGCAGCATAT TTTTTTTCAT CGGGTATTGG AACCGGAAAA 

251 CAGATGGAAA CGGATGGCAG GCAGACCCCG AACATCCGCT GCTCGGGCTT 

301 TTTGCCGTCA GTAATGTATC GATGACGCTT GCTTTTGTCG GAATATGTGC 

351 GTTGGTGCAT TATTGCTTTT CGGGAACGGT TCAAGTGTTT GTGTTTGCGG 

401 CATTGCTCAA ACTTTATGCG CTGAAGCCGG TTTATTGGTT CGTGTTGCAG 

451 TTTGTATTGA TGGCGGttgC CTATGTCCAC CGCTGCGGTA TAGACCGGCA 

501 GCCGCCGTCA ACGTTCGGCG GTTCGCAGCT GCGACTCGGC GTGTTGGCGG 

551 CGATGTTGAT GCAGGTTGCG GTAACGGCGA TGCTGCTTGC CGAAATCGGC 

601 AG AT G A 

This encodes a protein having amino acid sequence <SEQ ID 102>: 

1 MILLHLDFLS ALLYAAVFLF LIFRAGMLQW FWAS I ALWLG ISVLGVKLMP 

51 GMWGMTRAAP LFIPHFYLTL GSIFFFIGYW NRKTDGNGWQ ADPEHPLLGL 



101 
151 
201 R* 

This ORF1 8ng protein sequence shows 94.0% identity in 201 aa overlap with ORF1 8-1 : 

10 20 30 40 50 60 

orf 18-1 pep MILLHLDFLSALLYAAVFLFLIFRAGMLQWFWASIMLWLGISVLGAKLMPGIWGMTRAAP 

I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I I I I i I ! : I I I I I : I I I I I I I I 
orfl8ng MILLHLDFLSALLYAAVFLFLIFRAGMLQWFWASIALWLGISVLGVKLMPGMWGMTRAAP 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf 18-1. pep LFIPHFYLTLGSIFFFIGHWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGICALVH 
I I I I I I I I I I i I I I I I I I : I I II I I I I I I 1 I I I I I I I I I I I II I I I I I 1 I I I I I I I I I I I 
orfl8ng LFIPHFYLTLGSIFFFIGYWNRKTDGNGWQADPEHPLLGLFAVSNVSMTLAFVGICALVH 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 18-1 - pep YCFSGTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGSQLRLG 

I II M M M I I I I I I I I I I I M I I 11 M I I I I I I I I I II I I I I II II M I I I I I M I I I I 

orfl8ng YCFSGTVQVFVFAALLKLYALKPVYWFVLQFVLMAVAYVHRCGIDRQPPSTFGGSQLRLG 
130 140 150 160 170 180 

190 200 
or f 1 8 - 1 . pep GLTAALMQVSVLVLLLSE IGRX 



Based on this analysis, including the presence of several putative transmembrane domains in the 



gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 
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Example 13 

The following partial DNA sequence was identified in N. meningitidis <SEQ ED 103>: 

1 ATGAAAACCC CACTCCTCAA GCCTCTGCTN ATTACCTCGC TTCCCGTTTT 

51 CGCCAGTGTT TTTACCGCCG CCTCCATCGT CTGGCAGCTA GGCGAACCCA 

101 AGCTCGCCAT GCCCTT CGTA CTCGGCATCA TCGCCGGCGG CCTTGTCGAT 

151 TTGGACAACC NCNTGACCGG ACGGCTNAAA AACATCATCA CCACCGTCGC 

201 CCTGTTCACC CTCTCCTCGC TCACGGCACA AAGCACCCTC GGCACAGGGC 

251 TGCCCTTCAT CCTCGCCATG ACCCTGATGA CTT.CG.CTT CACCATTTTA 

301 GGCGCGGNCG . . . 

This corresponds to the amino acid sequence <SEQ ID 104; ORF19>: 

1 MKTPLLKPLL ITSLPVFASV FTAAS IVWQL GEPKLAMPFV LGIIAGGLVD 

51 LDNXXTGRLK NIITTVALFT LSSLTAQSTL GTGLPFILAM TLMTXXFTIL 

101 GAX... 

Further work revealed the complete nucleotide sequence <SEQ ID 105>: 

1 ATGAAAACCC CACTCCTCAA GCCTCTGCTC ATTACCTCGC TTCCCGTTTT 

51 CGCCAGTGTT TTTACCGCCG CCTCCATCGT CTGGCAGCTA GGCGAACCCA 

101 AGCTCGCCAT GCCCTTCGTA CTCGGCATCA TCGCCGGCGG CCTTGTCGAT 

151 TTGGACAACC GCCTGACCGG ACGGCTGAAA AACATCATCA CCACCGTCGC 

2 01 CCTGTTCACC CTCTCCTCGC TCACGGCACA AAGCACCCTC GGCACAGGGC 

2 51 TGCCCTTCAT CCTCGCCATG ACCCTGATGA CCTTCGGCTT CACCATTTTA 

301 GGCGCGGTCG GGCTCAAATA CCGCACCTTC GCCTTCGGTG CACTCGCCGT 

351 CGCCACCTAC ACCACACTTA CCTACACCCC CGAAACCTAC TGGCTGACCA 

4 01 ACCCCTTCAT GATTTTATGC GGCACCGTAC TGTACAGCAC CGCCATCCTC 

451 CTGTTCCAAA TCGTCCTGCC CCACCGCCCC GTCCAAGAAA GCGTCGCCAA 

501 CGCCTACGAC GCACTCGGCG GCTACCTCGA AGCCAAAGCC GACTTCTTCG 

551 ACCCCGATGA GGCAGCCTGG AT AGGCAAC C GCCACATCGA CCTCGCCATG 

601 AGCAACACCG GCGTCATCAC CGCCTTCAAC CAATGCCGTT CCGCCCTGTT 

651 TTACCGCCTT CGCGGCAAAC ACCGCCACCC GCGCACCGCC AAAATGCTGC 

7 01 GTTACTACTT TGCCGCCCAA GACATACACG AACGCATCAG CTCCGCCCAC 

751 GTCGATTATC AGGAAATGTC CGAAAAATTC AAAAACACCG ACATCATCTT 

801 CCGCATCCAC CGCCTGCTCG AAATGCAGGG ACAAGCCTGC CGCAACACCG 

851 CCCAAGCCCT GCGCGCAAGC AAAGACTACG TTTACAGCAA ACGCCTCGGC 

901 CGCGCCATCG AAGGCTGCCG CCAATCGCTG CGCCTCCTTT CAGACAGCAA 

951 CGACAGTCCC GACATCCGCC ACCTGCGCCG CCTTCTCGAC AACCTCGGCA 

1001 GCGTCGACCA GCAGTTCCGC CAACTCCAGC ACAACGGCCT GCAGGCAGAA 

1051 AACGACCGCA TGGGCGACAC CCGCATCGCC GCCCTCGAAA CCAGCAGCCT 

1101 CAAAAACACC TGGCAGGCAA TCCGTCCGCA GCTAAACCTC GAATCAGGCG 

1151 TATTCCGCCA TGCCGTCCGC CTGTCCCTCG TCGTTGCCGC CGCCTGCACC 

1201 ATCGTCGAAG CCCTCAACCT CAACCTCGGC TACTGGATAC TACTGACCGC 

1251 CCTTTTCGTC TGCCAACCCA ACTACACCGC CACCAAAAGC CGCGTCCGCC 

1301 AGCGCATCGC CGGCACCGTA CTCGGCGTAA TCGTCGGCTC GCTCGTCCCC 

1351 TACTTCACCC CGTCTGTCGA AACCAAACTC TGGATTGTCA TCGCCAGTAC 

1401 CACCCTCTTT TTCATGACCC GCACCTACAA ATACAGTTTC TCCACCTTCT 

1451 TCATTACCAT TCAAGCCCTG ACCAGCCTCT CCCTCGCAGG TTTGGACGTA 

1501 TACGCCGCCA TGCCCGTACG CATCATCGAC ACCATTATCG GCGCATCCCT 

1551 TGCCTGGGCG GCAGT CAGCT ACCTGTGGCC AGACTGGAAA TACCTCACGC 

1601 TCGAACGCAC CGCCGCCCTT GCCGTATGCA GCAACGGTGC CTATCTCGAA 

1651 AAAAT CACCG AACGCCTCAA AAGCGGCGAA ACCGGCGACG ACGTCGAATA 

17 01 CCGCGCCACC CGCCGCCGCG CCCACGAACA CACCGCCGCC CTCAGCAGCA 

1751 CCCTTTCCGA CATGAGCAGC GAACCCGCAA AATTCGCCGA CAGCCTGCAA 

1801 CCCGGCTTTA CCCTGCTCAA AACCGGCTAC GCCCTGACCG GCTACATCTC 

1851 CGCCCTCGGC GCATACCGCA GCGAAATGCA CGAAGAATGC AGCCCCGACT 

1901 TTACCGCACA GTTCCACCTC GCCGCCGAAC ACACCGCCCA CATCTTCCAA 

1951 CACCTGCCCG AAACCGAACC CGACGACTTT CAGACAGCAC TGGATACACT 

2 001 GCGCGGCGAA CTCGACACCC TCCGCACCCA CAGCAGCGGA ACACAAAGCC 

2051 ACATCCTCCT CCAACAGCTC CAACTCATCG CCCGACAGCT CGAACCCTAC 

2101 TACCGCGCCT ACCGCCAAAT TCCGCACAGG CAGCCCCAAA ATGCAGCCTG 

2151 A 

This corresponds to the amino acid sequence <SEQ ED 106; ORF19-l>: 

1 MKTPLLKPLL ITSLPVFASV FT AAS IVWQL GEPK LAMPFV LGIIAGGLVD 
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51 LDNRLTGRLK NIITTVALFT LSSLTAQSTL GTGLPF ILAM TLMTFGFTIL 

101 GAVGLKYRTF AFGALAVATY TTLTYTPETY WLTNP FMILC GTVLYSTAIL 

151 LFQIVLPHRP VQESVANAYD ALGGYLEAKA DFFDPDEAAW IGNRHIDLAM 

201 SNTGVITAFN QCRSALFYRL RGKHRHPRTA KMLRYYFAAQ DIHERISSAH 

251 VDYQEMSEKF KNTDIIFRIH RLLEMQGQAC RNTAQALRAS KDYVYSKRLG 

3 01 RAIEGCRQSL RLLSDSNDSP DIRHLRRLLD NLGSVDQQFR QLQHNGLQAE 
351 NDRMGDTR I A ALETSSLKNT WQAIRPQLNL ESGVFRHAVR LSLWAAACT 

4 01 IVEALNLNLG YWILLTALFV CQPNYTATKS RVRQR IAGTV LGVIVGSLVP 
4 51 YFTPSVETKL WIVIASTTLF FMTRTYKYSF STFFITIQAL TSLSLAGLDV 
501 YAAMPVRIID TIIGASLAWA AVSYLWPDWK YLTLERTAAL AVCSNGAYLE 
551 KITERLKSGE TGDDVEYRAT RRRAHEHTAA LSSTLSDMSS EPAKFADSLQ 
601 PGFTLLKTGY ALTGYISALG AYRSEMHEEC SPDFTAQFHL AAEHTAHIFQ 
651 HLPETEPDDF QTALDTLRGE LDTLRTHSSG TQSHILLQQL QLIARQLEPY 
7 01 YRAYRQIPHR QPQNAA* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with predicted transmenbrane protein YHFK of H. influenzae (accession number P44289) 
ORF19 and YHFK proteins show 45% aa identity in 97 aa overlap: 

orfl9 6 LKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNXXTGRLKNIITT 65 

L +I+++PVF +V AA +W +MP +LGI IAGGLVDLDN TGRLKN+ T 

YHFK 5 LNAKVISTIPVFIAVNIAAVGIWFFDISSQSMPLILGIIAGGLVDLDNRLTGRLKNVFFT 64 

orfl9 66 V AL FT L S S LT AQ STLGTGLPFI LAMT LMTXXFT I LGA 102 

+ F++SS Q +G + +1+ MT++T FT++GA 
YHFK 65 L I AF SIS S FI VQLH I GKP I QY I VLMTVLT FI FTMI GA 101 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF19 shows 92.2% identity over a 102aa overlap with an ORF (ORF 19a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orfl9.pep MKTPLLKPLLITSLPVFASVFTA ASIVWQLGEPK LAMPFVLGIIAGGLVDL DNXXTGRLK 
I I I I I I I I I I II I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I 1 I I I I 
orfl9a MKTPPLKPLLITSLPVFASVFTA ASIVWQLGEPK LAMPFVLGIIAGGLVDL DNRLTGRLK 

10 20 30 40 50 60 



70 80 90 100 

NIITTVALFTLSSLTAQSTLGTGLPF ILAMTLMTXXFTILGAX 
I I I : I I I I I I I I M : I I I I I I I I I I I I I I I I I I I 111:11 

HIIATVALFTLSSLVAQSTLGTGLPF ILAMTLMTFGFTIMGAV GLKYRTFAFGALAVATY 
70 80 90 100 110 120 



O r f 1 9 a TT LTYT PETYWLTNP FMILCGTVLYSTAI ILF QI ILPHRPVQENVANAYE ALGS YLEAKA 

130 140 150 160 170 180 

The complete length ORF 19a nucleotide sequence <SEQ ID 107> is: 



1 ATGAAAACCC CACCCCTCAA GCCTCTGCTC ATTACCTCGC TTCCCGTTTT 

51 CGCCAGTGTC TTTACCGCCG CCTCCATCGT CTGGCAGCTG GGCGAACCCA 

101 AGCTCGCCAT GCCCTTCGTA CTCGGCATCA TCGCTGGCGG CCTGGTCGAT 

151 TTGGACAACC GCCTGACCGG ACGGCTGAAA AAC AT CAT CG CCACCGTCGC 

201 CCTGTTCACC CTCTCCTCAC TTGTCGCGCA AAGCACCCTC GGCACAGGTT 

251 TGCCATTCAT CCTCGCCATG ACCCTGATGA CTTTCGGCTT TACCATCATG 

301 GGCGCGGTCG GGCT GAAATA CCGCACCTTC GCCTTCGGCG CACTCGCCGT 

351 CGCCACCTAC ACCACACTTA CCTACACCCC CGAAACCTAC TGGCTGACCA 

401 ACCCCTTTAT GATTCTGTGC GGAACCGTAC TGTACAGCAC CGCCATCATC 

451 CTGTTCCAAA TCATCCTGCC CCACCGCCCC GTTCAAGAAA ACGTCGCCAA 

501 CGCCTACGAA GCACTCGGCA GCTACCTCGA AGCCAAAGCC GACTTTTTCG 

551 ATCCCGACGA AGCCGAATGG ATAGGCAACC GCCACATCGA CCTCGCCATG 

601 AGCAACACCG GCGTCATCAC CGCCTTCAAC CAATGCCGTT CCGCCCTGTT 

651 TTACCGCCTT CGCGGCAAAC ACCGCCACCC GCGCACCGCC AAAATGCTGC 

7 01 GCTACTACTT CGCCGCCCAA GACATACACG AACGCATCAG CTCCGCCCAC 

751 GTCGACTACC AAGAGATGTC CGAAAAATTC AAAAACACCG ACATCATCTT 
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8 01 CCGCATCCAC CGCCTGCTCG AAATGCAGGG ACAAGCCTGC CGCAACACCG 

851 CCCAAGCCCT GCGCGCAAGC AAAGACTACG TTTACAGCAA ACGCCTCGGC 

901 CGCGCCATCG AAGGCTGCCG CCAATCGCTG CGCCTCCTTT CAGACAGCAA 

951 CGACAATCCC GACATCCGCC ACCTGCGCCG CCTTCTCGAC AACCTCGGCA 

5 1001 GCGTCGACCA GCAGTTCCGC CAACTCCAGC ACAACGGCCT GCAGGCAGAA 

1051 AACGACCGCA TGGGCGACAC CCGCATCGCC GCCCTCGAAA CCGGCAGCCT 

1101 CAAAAACACC TGGCAGGCAA TCCGTCCGCA GCTAAACCTC GAATCAGGCG 

1151 TATTCCGCCA TGCCGTCCGC CTGTCCCTTG TCGTTGCCGC CGCCTGCACC 

1201 ATCGTCGAAG CCCTCAACCT CAACCTCGGC TACTGGATAC TACTGACCGC 

\0 1251 CCTTTTCGTC TGCCAACCCA ACTACACCGC CACCAAAAGC CGCGTCCGCC 

1301 AGCGCATCGC CGGCACCGTA CTCGGCGTAA TCGTCGGCTC GCTCGTCCCC 

1351 TACTTTACCC CCTCCGTCGA AACCAAACTC TGGATCGTCA TCGCCAGTAC 

14 01 CACCCTCTTT TTCATGACCC GCACCTACAA ATACAGCTTC TCGACATTTT 

1451 T CAT CACCAT TCAAGCCCTG ACCAGCCTCT CCCTCGCAGG GTTGGACGTA 

15 1501 TACGCCGCCA TGCCCGTACG CATCATCGAC ACCATTATCG GCGCATCCCT 

1551 TGCCTGGGCG GCAGTCAGCT ACCTGTGGCC AGACTGGAAA TACCTCACGC 

1601 TCGAACGCAC CGCCGCCCTT GCCGTATGCA GCAACGGCGC CTATCTCGAA 

1651 AAAATCACCG AACGCCTCAA AAGCGGCGAA ACCGGCGACG ACGTCGAATA 

17 01 CCGCGCCACC CGCCGCCGCG CCCACGAACA CACCGCCGCC CTCAGCAGCA 
20 17 51 CCCTTTCCGA CATGAGCAGC GAACCCGCAA AATTCGCCGA CAGCCTGCAA 

18 01 CCCGGCTTTA CCCTGCTCAA AACCGGCTAC GCCCTGACCG GCTACATCTC 
18 51 CGCCCTCGGC GCATACCGCA GCGAAATGCA CGAAGAATGC AGCCCCGACT 
1901 TTACCGCACA GTTCCACCTC GCCGCCGAAC ACACCGCCCA CATCTTCCAA 
1951 CACCTGCCCG AAACCGAACC CGACGACTTT CAGACAGCAC TGGATACACT 

25 2001 GCGCGGCGAA CTCGACACCC TCCGCACCCA CAGCAGCGGA ACACAAAGCC 

2 051 ACATCCTCCT CCAACAGCTC CAACTCATCG CCCGGCAGCT CGAACCCTAC 

2101 TACCGCGCCT ACCGACAAAT TCCGCACAGG CAGCCCCAAA ACGCAGCCTG 

2151 A 

This encodes a protein having amino acid sequence <SEQ ID 108>: 

30 1 MKTPPLKPLL ITSLPVFASV FT AAS I VWQL GEPK LAMPFV LGI IAGGLVD 

51 LDNRLTGRLK NIIATVALFT LSSLVAQSTL GTGLPF ILAM TLMTFGFTIM 

101 GAV GLKYRTF AFGALAVATY TTLTYTPETY WLTNP FMILC GTVLYSTAII 

151 LFQIILPHRP VQENVANAYE ALGSYLEAKA DFFDPDEAEW IGNRHIDLAM 

201 SNTGVITAFN QCRSALFYRL RGKHRHPRTA KMLRYYFAAQ DIHERISSAH 

35 251 VDYQEMSEKF KNTDIIFRIH RLLEMQGQAC RNTAQALRAS KDYVYSKRLG 

301 RAIEGCRQSL RLLSDSNDNP DIRHLRRLLD NLGSVDQQFR QLQHNGLQAE 

351 NDRMGDTRIA ALETGSLKNT WQAIRPQLNL ESGVFRHAVR LSLWAAACT 

401 IVEALNLNLG YWILLTALFV CQPNYTATKS RVRQR IAGTV LGVIVGSLVP 

451 Y FT PSVETKL WIVIASTTLF FMTRTYKYSF STFFITIQAL TSLSLAGLDV 

40 501 Y AAMPVR 1 1 D TIIGASLAWA AVSYLWPDWK YLTLERTAAL AVC SNGAYLE 

551 KITERLKSGE TGDDVEYRAT RRRAHEHTAA LSSTLSDMSS EPAKFADSLQ 

601 PGFTLLKTGY ALTGYISALG AYRSEMHEEC SPDFTAQFHL AAEHTAHIFQ 

651 HLPETEPDDF QTALDTLRGE LDTLRTHSSG TQSHILLQQL QLIARQLEPY 

701 YRAYRQIPHR QPQNAA* 

45 ORF19a and ORF19-1 show 98.3% identity in 716 aa overlap: 

10 20 30 40 50 60 

orf 19a. pep MKTPPLKPLLITSLPVFASVFTAASIVWQLGE PKLAMPFVLGIIAGGLVDLDNRLTGRLK 

I! 11 I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I 
orf 19-1 MKTPLLKPLLITSLPVFASVFTAASIVWQLGE PKLAMPFVLGIIAGGLVDLDNRLTGRLK 

50 10 20 30 40 50 60 

70 80 90 100 110 120 

orf 19a. pep NIIATVALFTLSSLVAQSTLGTGLPFILAMTLMT FGFTIMGAVGLKYRTFAFGALAVATY 

I II : I I I I I I I I I I : I I I I I I I I I I M I I I I I 1 I I I I I I : I I I I I I I I I I I I I I I I I I I I 
55 orf 19-1 NIITTVALFTLSSLTAQSTLGTGLPFILAMTLMTFGFTILGAVGLKYRTFAFGALAVATY 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 19a. pep TTLTYTPETYWLTNPFMILCGTVLYSTAIILFQIILPHRPVQENVANAYE ALGSYLEAKA 

60 I I I I I I I I I I I I I I I I I II I I I II I I I I I : I I M : I I I I I I I I : I I II I : I I I : I I I I I I 

orf 19-1 TTLTYTPETYWLTNPFMILCGTVLYSTAILLFQIVLPHRPVQESVANAYDALGGYLEAKA 
130 140 150 160 170 180 

190 200 210 220 230 240 

65 orf 1 9a . pep DFFDPDEAEWIGNRHIDLAMSNTGVITAFNQCRSALFYRLRGKHRHPRTAKMLRYYFAAQ 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 
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orfl9-l D FFDPDEAAWIGNRHIDLAMSNTGVITAFNQCRSALFYRLRGKHRHPRTAKMLRYYFAAQ 
190 200 210 220 230 240 

250 260 270 280 290 300 

orfl9a pep DIHERISSAHVDYQEMSEKFKNTDIIFRIHRLLEMQGQACRNTAQALRASKDYVYSKRLG 
M I I I I I I I I I I M I I I I I I I I I M M I I I I II I I I I I I I I 1 I I I I I I I I I I I I I I I I I I 
orf 19-1 DIHERISSAHVDYQEMSEKFKNTDIIFRIHRLLEMQGQACRNTAQALRASKDYVYSKRLG 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 19a pep RAIEGCRQSLRLLSDSNDNPDIRHLRRLLDNLGSVDQQFRQLQHNGLQAENDRMGDTRIA 

Mill! I II II I I : I I I 1 I! I II I I I I I I I I I I I I I I I II 

orf 19-1 RAIEGCRQSLRLLSDSNDSPDIRHLRRLLDNLGSVDQQFRQLQHNGLQAENDRMGDTRIA 
310 320 330 340 350 360 

370 380 390 400 410 420 

orf 19a pep AL E T G S LKNT WQAI RPQLNLES GV FRHAVRL S L VVAAACT I VE ALN LN LGYW I L LT AL FV 

I I I I : 1 II I I I I I I I i I I I I I I I I I I I I I I I I I M II I I I I I I I I I I I I I I I I M I I I I I 
orf 19-1 ALETSSLECNTWQAIRPQLNLESGVFRHAVRLSLWAAACTIVEALNLNLGYWILLTALFV 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf 19a . pep CQPNYTATKSRVRQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIASTTLFFMTRTYKYSF 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 1 I I I I I I I I I I I I I I I I I I I I I I I M I I 
orf 19-1 CQPNYTATKSRVRQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIASTTLFFMTRTYKYSF 
430 440 450 460 470 480 

490 500 510 520 530 540 

orf 19a. pep STFFITIQALTSLSLAGLDVYAAMPVRIIDTIIGASLAWAAVSYLWPDWKYLTLERTAAL 
I I I I I I I I I I M I I I I II I I I I I I I I I II I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I 
orf 19-1 STFFITIQALTSLSLAGLDVYAAMPVRIIDTIIGASLAWAAVSYLWPDWKYLTLERTAAL 

490 500 510 520 " 530 540 

550 560 570 580 590 600 

orf 19a . pep AVCSNGAYLEKITERLKSGETGDDVEYRATRRRAHEHTAALSSTLSDMSSEPAKFADSLQ 

orf 19-1 AVCSNGAYLEKITERLKSGETGDDVEYRATRRRAHEHTAALSSTLSDMSSEPAKFADSLQ 
550 560 570 580 590 600 



orfl9a.pep 
orfl9-l 



50 



orf 19a . pep 
orf 19-1 



670 680 690 700 710 

QTALDTLRGELDTLRTHSSGTQSHILLQQLQLIARQLEPYYRAYRQIPHRQPQNAAX 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
QTALDTLRGELDTLRTHSSGTQSHILLQQLQLIARQLEPYYRAYRQIPHRQPQNAAX 

670 680 690 700 710 



Homology with a predicted ORF from N.gonorrhoeae 

ORF19 shows 95.1% identity over a 102aa overlap with a predicted ORF (ORF19.ng) from TV. 
gonorrhoeae: 

orf 19 .pep MKTPLLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNXXTGRLK 60 

I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II M I I I I I I I I I I I 
orfl9ng MKTPLLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNRLTGRLK 60 

orf 1 9 . pep NIITTVALFTLSSLTAQSTLGTGLPFILAMTLMTXXFTILGAX 

I I I : I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 
orfl9ng NIIATVALFTLSSLTAQSTLGTGLPFILAMTLMTFGFTILGAVGLKYRTFAFGALAVATY 



103 
120 



An ORF19ng nucleotide sequence <SEQ ID 109> is predicted to encode a protein having amino 
acid sequence <SEQ ID 1 10>: 
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1 MKTPLLKPLL ITSLPVFASV FTAASIVWQL GEPKLAMPFV LGIIAGGLVD 

51 LDNRLTGRLK NIIATVA LFT LSSLTAQSTL GTGLPFILAM TLMTFGFTIL 

101 GAVGLKYRTF AFGALAVAT Y TTLTYTPETY WLTNPF MILC GTVLYSTAII 

151 LFQIILPHRP VQESVAN AYE ALGGYLEAKA DFFDP DEAAW IGNRHIDLAM 

201 SNTGVITAFN QCRSALFYRL RGKHRHPRTA KMLRYYFAAQ DIHERISSAH 

251 VDYQEMSEKF KNTDIIFRIR RLLEMQGQAC RNTAQAIRSG KDYVYSKRLG 

301 RAIEGCRQSL RLLSDGNDSP DIRHLSRLLD NLGSVDQQFR QLRHSDSPAE 

351 NDRMGDTRIA ALETGSFKNT * 

Further work revealed the complete nucleotide sequence <SEQ ID 1 1 1>: 



1 


ATGAAAACCC 


CACTCCTCAA 




CGCCAGTGTC 


TTTACCGCCG 


101 


AGCTCGCCAT 


GCCCTTCGTA 


151 


TTGGACAACC 


GCCTGACCGG 


201 


CCTGTTTACC 


CTCTCCTCGC 


251 


TGCCCTTCAT 


CCTCGCCATG 




GGCGCGGTCG 


GGCT GAAAT A 


351 


CGCCACCTAC 


ACCACGCTTA 


401 


ACCCCTTCAT 


GATTTTATGC 




CTGTTCCAAA 


TCATCCTGCC 


501 


TGCCTACGAA 


GCACTCGGCG 


551 


ACCCCGATGA 


GGCAGCCTGG 


601 


AGCAACACCG 


GCGTCATCAC 




TTACCGTTTG 


CGCGGCAAAC 






CGCCGCCCAA 




GTCGACTACC 


AAGAG AT GT C 




CCGCATCCGC 


CGCCTGCTCG 






CCGGTCGGGC 


901 


CGCGCCATcg 


aaggctgCCG 


951 


CGACAGTCCC 


GACATCCGCC 


1001 






1051 


Aacgaccgca 


tgggcgacaC 


1101 


caaaaaCAcc 


tggcaggCAA 


1151 


TATTCCGCCA 


TGCCGTCCGC 


1201 


ATCGTCgaag 


cCCTCAACCT 


1251 


CCTTTTCGTC 


TGCCAACCCA 


1301 


AACGCATCGC 


CGGCACCGTA 


1351 


TACTTCACCC 


CCTCCGTCGA 


1401 


CACCCTGTTC 


TTCATGACCC 


1451 


T CATC AC CAT 


TCAGGCACTG 


1501 


TACGCCGCCA 


TGCCCGTGCG 


1551 


TGCCTGGGCG 


GCGGTCAGCT 


1601 


TCGAACGCAC 


CGCCGCCCTT 


1651 


AAAATTGCCG 


AACGCCT CAA 


1701 


CCGCATCACC 


CGCCGCCGCG 


1751 


CCCTTTCCGA 


CATGAGCAGC 


1801 


CCCGGCTTTA 


CCCTGCTCAA 


1851 


CGCCCTCGGC 


GCATACCGCA 


1901 


TTACCGCACA 


GTTCCACCTT 


1951 


CACCTGCCCG 


ACATGGGACC 


2001 


GCGCGGCGAA 


CTCGGCACCC 


2051 


ACATCCTCCT 


CCAACAGCTC 


2101 


TACCGCGCCT 


ACCGACAAAT 


2151 


A 





This corresponds to the amino acid 



GCCTCTGCTC ATTACCTCGC TTCCCGTTTT 
CCTCCATCGT CTGGCAGCTA GGCGAACCCA 
CTCGGCATCA TCGCCGGCGG CCTGGTCGAT 
ACGGCTGAAA AACATCATCG CCACCGTCGC 
TCACGGCGCA AAGCACCCTC GGCACAGGGC 
ACCCTGATGA CCTTCGGCTT TACCATTTTA 
CCGCACCTTC GCCTTCGGCG CACTCGCCGT 
CCTACACCCC CGAAACCTAC TGGCTGACCA 
GGCACCGTAC TGTACAGCAC CGCCATCATC 
CCACCGCCCC GTCCAAGAAA GCGTCGCCAA 
GCTACCTCGA AGCCAAAGCC GACTTCTTCG 
ATAGGCAACC GCCACATCGA CCTCGCCATG 
CGCCTTCAAC CAATGCCGTT CCGCCCTGTT 
ACCGCCACCC GCGCACCGCC AAAATGCTGC 
GACATCCACG AACGC AT CAG CTCCGCCCAC 
CGAAAAATTC AAAAACACCG ACATCATCTT 
AAATGCAGGG GCAGGCGTGC CGCAACACCG 
AAAGACTAcg tTTACAGCAA ACGCCTCGGA 

CCAGTCGCtg cgcctCCTTt cagacggcaA 
ACCTGAGccg CCTTCTCGAC AACCTCGgca 
caactCCGAC ACAgcgactC CCCCGCcgaa 
CCGCATCGCC GCCCtcgaaa ccggcagctT 
TCCGTCCGCa gctgaaCCTC GAATCatgCG 
CTGTCCCTCG TCGTTGCCGC CGCCTGCACC 
CAACCTCGGC TACTGGATAC TGCTGACCGC 
ACTACACCGC CACCAAAAGC CGCGTGTACC 
CTCGGCGTAA TCGTCGGCTC GCTCGTCCCC 
AACCAAACTC TGGATTGTCA TCGCCGGTAC 
GCACCTACAA ATACAGTTTC TCCACCTTCT 
ACCAGCCTCT CCCTCGCAGG TTTGGACGTA 
CATCATcgaC ACCATTATCG GCGCATCCCT 
ACCTGTGGCC AGACTGGAAA TACCTCACGC 
GCCGTATGCA GCAGCGGCAC ATACCTCCAA 
AACCGGCGAA ACCGGCGACG ACATAGAATA 
CCCACGAACA CACCGCCGCC CTCAGCAGCA 
GAACCCGCAA AATTCGCCGA CAGCCTGCAA 
AACCGGCTAC GCCCTGACCG GCTACATCTC 
GCGAAATGCA CGAAGAATGC AGCCCCGACT 
GCCGCCGAAC ACACCGCCCA CATCTTCCAA 
CGACGACTTT CAGACGGCAT TGGATACACT 
TCCGCACCCG CAGCAGCGGA ACACAAAGCC 
CAAC T CAT CG CccgGCAACT CGAACCCTAC 
TCCGCACAGG CAGCCCCAAA ACGCAGCCTG 



<SEQ ID 112; ORF19ng-l>: 



1 MKTPLLKPLL ITSLPVFASV FTA ASIVWQL GEP KLAMPFV LGIIAGGLVD 

51 LDNRLTGRLK NIIATVALFT LSSLTAQSTL GTGLPF ILAM TLMTFGFTIL 

101 GAVGLKYRTF AFGALAVATY TTLTYTPETY WLTNP FMILC GTVLYSTAII 

151 LFQIILPHRP VQESVANAYE ALGGYLEAKA DFFDPDEAAW IGNRHIDLAM 

201 SNTGVITAFN QCRSALFYRL RGKHRHPRTA KMLRYYFAAQ DIHERISSAH 

251 VDYQEMSEKF KNTDIIFRIR RLLEMQGQAC RNTAQAIRSG KDYVYSKRLG 

301 RAIEGCRQSL RLLSDGNDSP DIRHLSRLLD NLGSVDQQFR QLRHSDSPAE 

351 NDRMGDTRIA ALETGSFKNT WQAIRPQLNL ESCVFRHAVR LSLWAAACT 

401 IVEALNL NLG YWILLTALFV CQPNYTATKS RVYQR IAGTV LGVIVGSLVP 

451 YFTPSVETKL WIVIAGTTLF FMTRTYKYSF STFFITIQAL TSLSLAGLDV 

501 YAAMPVRIID TIIGASLAWA AVSYLWPDWK YLTLERTAAL AVCSSGTYLQ 

551 KIAERLKTGE TGDDIEYRIT RRRAHEHTAA LSSTLSDMSS EPAKFADSLQ 
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601 PGFTLLKTGY ALTGYISALG AYRSEMHEEC SPDFTAQFHL AAEHTAHIFQ 
651 HLPDMGPDDF QTALDTLRGE LGTLRTRSSG TQSHILLQQL QLIARQLEPY 
7 01 YRAYRQIPHR QPQNAA* 

ORF19ng-l and ORF19-1 show 95.5% identity in 716 aa overlap: 

10 20 30 40 50 60 

orf 19-1 pep MKTPLLKPLLITSLPVFASVFTAASIVWQLGE PKLAMPFVLGIIAGGLVDLDNRLTGRLK 
| I | 1 | | I I I I I I I I I I I I I I I I M I I I I I I I I 1 I I I I I I 1 I I I I I I I I I I I I I M M I i I 
orf 19nq-l MKTPLLKPLLITSLPVFASVFTAASIVWQLGEPKLAMPFVLGIIAGGLVDLDNRLTGRLK 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 19-1 pep NIITTVALFTLSSLTAQSTLGTGLPFILAMTLMTFGFTILGAVGLKYRTFAFGALAVATY 
I I I : I i I [ I I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I ! I I I i 
orf 19ng-l NIIATVALFTLSSLTAQSTLGTGLPFILAMTLMTFGFTILGAVGLKYRTFAFGALAVATY 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 19-1 .pep TTLTYTPETYWLTNPFMILCGTVLYSTAILLFQIVL PHRPVQESVANAYDALGGYLEAKA 
I I I I I I I I I I M I I I I I M I I I I I I I I I I : I I I I : I I I I I I I I I I I I I I : I II I I I I II I 
orf 19ng-l TTLTYTPETYWLTNPFMILCGTVLYSTAIILFQIILPHRPVQESVANAYEALGGYLEAKA 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 1 9-1 . pep DFFDPDEAAWIGNRHIDLAMSNTGVITAFNQCRSALFYRLRGKHRHPRTAKMLRYYFAAQ 
M I II I I I I I I I M I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl9ng-l DFFDPDEAAWIGNRHIDLAMSNTGVITAFNQCRSALFYRLRGKHRHPRTAKMLRYYFAAQ 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 19-1 . pep D I HERI S S AHVD YQEMSEKFKNT D 1 1 FRI HRLLEMQGQACRNT AQALRAS KD YV YSKRLG 

I I I I I I I I M I II I I I I I I I I I I I I I I II : I I I I I I I I I II I I II I = I = = II I I I I I I I I 
orf 19ng-l DIHERISSAHVDYQEMSEKFKNTDIIFRIRRLLEMQGQACRNTAQAIRSGKDYVYSKRLG 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 19-1. pep RAIEGCRQSLRLLSDSNDSPDIRHLRRLLDNLGSVDQQFRQLQHNGLQAENDRMGDTRIA 
I II I II II I I I I I I I : I I I I I I I I I I II I I M I I I I I II II : I : I I I I I I I I I II I 
orf 19ng-l RAIEGCRQSLRLLSDGNDSPDIRHLSRLLDNLGSVDQQFRQLRHSDSPAENDRMGDTRIA 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 19-1. pep ALET S S LKNTWQAIRPQLNLE SGVFRHAVRL SLWAAACT IVEALNLNLGYW I LLTAL FV 
I I I I : I : II I I I I I I I I I I I I I I I I I I I I II I I I I I I I 1 I I I II I I I I I I I I I I II I II 
orfl9ng-l ALETGSFKNTWQAIRPQLNLESCVFRHAVRL SLWAAACT IVEALNLNLGYWILLTALFV 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf 1 9-1 . pep CQPNYTATKSRVRQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIASTTLFFMTRTYKYSF 
I I I I I I I I I I I I I I II I I [ II I I I I II II I I I I I II I I I I II I I : I I I I I I II I I I I I I 
orf 19ng-l CQPNYTATKSRVYQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIAGTTLFFMTRTYKYSF 
430 440 450 460 470 480 

490 500 510 520 530 540 

orf 19-1 .pep STFFITIQALTSLSLAGLDVYAAMPVRIIDTIIGASLAWAAVSYLWPDWKYLTLERTAAL 
I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I I I I I II II M I I I I I I I I II I I 
orfl9ng-l STFFITIQALTSLSLAGLDVYAAMPVRIIDTIIGASLAWAAVSYLWPDWKYLTLERTAAL 

490 500 510 520 530 540 



610 620 630 640 650 660 

PGFTLLKTGYALTGYISALGAYRSEMHEECSPDFTAQFHLAAEHTAHIFQHLPETEPDDF 
I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I II II II I I II I I I I I I I I I I I : MM 
PGFTLLKTGYALTGYISALGAYRSEMHEECSPDFTAQFHLAAEHTAHIFQHLPDMGPDDF 
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610 620 630 640 650 660 

670 680 690 700 710 

orfl9-l pep QTALDTLRGELDTLRTHSSGTQSHILLQQLQLIARQLEPYYRAYRQIPHRQPQNAAX 

M | I I I I I I I I ! I I I : M I 1 I II I I I I I I I I I I I I I I I I I I M I I I I 

orfl9ng-l QTALDTLRGELGTLRTRSSGTQSHILLQQLQLIARQLEPYYRAYRQIPHRQPQNAAX 

670 680 690 700 710 

In addition, ORF19ng-l shows significant homology to a hypothetical gonococcal protein 
previously entered in the databases: 

sp|033369 | YOR2_NEIGO HYPOTHETICAL 45.5 KD PROTEIN (ORF2) gnl | PID [ ell54438 
(AJ002423) hypothetical protein [Neisseria gonorrh] Length = 417 

Score = 1512 (705.6 bits), Expect = 5.3e-203, P = 5.3e-203 

Identities = 301/326 (92%), Positives = 306/326 (93%) 

Query: 307 RQSLRLLSDGNDSPDIRHLSRLLDNLGSVDQQFRQLRHSDSPAENDRMGDTRIAALETGS 366 

RQSLRLLSDGNDS DIRHLSRLLDNLGSVDQQFRQLRHSDSPAENDRMGDTRIAALETGS 
Sbjct: 1 RQSLRLLSDGNDSXDIRHLSRLLDNLGSVDQQFRQLRHSDSPAENDRMGDTRIAALETGS 60 

Query: 367 FKNTWQAIRPQLNLESCVFRHAVRLSLWAAACTIVEALNLNLGYWILLTALFVCQPNYT 426 

FKNTWQAIRPQLNLES VFRHAVRLSLWAAACTIVEALNLNLGYWILLT LFVCQPNYT 
Sbjct: 61 FKNTWQAIRPQLNLESGVFRHAVRLSLWAAACTIVEALNLNLGYWILLTRLFVCQPNYT 120 

Query: 427 ATKSRVYQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIAGTTLFFMTRTYKYSFSTFFIT 486 

ATKSRVYQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIAGTTLFFMTRTYKYSFSTFFIT 
Sbjct: 121 ATKSRVYQRIAGTVLGVIVGSLVPYFTPSVETKLWIVIAGTTLFFMTRTYKYSFSTFFIT 180 

Query: 487 IQALTSLSLAGLDVYAAMPVRIIDTIIGASLAWAAVSYLWPDWKYLTLERTAALAVCSSG 546 

I QALT SLS LAGLDVYAAMPVRI I DT I I GASLAWAAVS YLWPDWKYLTLERTAALAVCS SG 
Sbjct: 181 I QALT SLSLAGLDVYAAMPVRI I DTI I GASLAWAAVS YLWPDWKYLTLERTAALAVCS SG 240 

Query: 547 TYLQKIAERLKTGETGDDIEYRITRRRAHEHTAALSSTLSDMSSEPAKFADSLQPGFTLL 606 

TYLQKIAERLKTGETGDDIEYRITRRRAHEHTAALSSTLSDMSSEPAKFAD+ P 
Sbjct: 241 TYLQKIAERLKTGETGDDIEYRITRRRAHEHTAALSSTLSDMSSEPAKFADTCNPALPCS 300 

Query: 607 KTGYALTGYISALGAYRSEMHEECSP 632 

K ALTGYISALG ++ + +P 
Sbjct: 301 KPATALTGYISALGHTAAKCTKNAAP 326 

Based on this analysis, including the presence of several putative transmembrane domains in the 
gonococcal protein (the first of which is also seen in the meningococcal protein), and on homology 
with the YHFK protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 14 

The following DNA sequence, believed to be complete, was identified in TV. meningitidis <SEQ ID 
113>: 

1 ATGAATATGC TGGGAGCTTT GGCAAAAGTC GGCAGCCTGA CGATGGTGTC 

51 GCGCGTTTTG GGATTTGTGC GCGATACGGT CATTGCGCGG GCATTCGGCG 

101 CGGGTATGGC GACGGATGCG TTTTTTGTCG CGTTCAAACT GCCCAACCTG 

151 CTTCGCCGCG TGTTTGCGGA GGGGGCGTTT GCCCAAGCGT TTGTGCCGAT 

201 TTTGGCGGAA TACAAGGAAA CGCGTTCAAA AGAGGCGG.C GAAGCCTTTA 

251 TCCGCCATGT GGCGGGGATG CTGTCGTTTG TACTGGTTAT CGTTACCGCG 

301 CTGGGCATAC TTGCCGCGCC TTGGGTGATT TATGTTTCCG CACCCSAGTT 

351 TTGCCCAAGA TGCCGACAAA TTTCAGCTCT CCATCGATTT GCTGCGGATT 

4 01 ACGTTTCCTT ATATATTATT GATTTCCCTG TCTTCATTTG TCGGCTCGGT 

451 ACTCAATTCT TAT CAT AAGT TCGGCATTCC GGCGTTTACG CCAC . GTTTC 

501 TGAACGTGTC GTTTATCGTA TTCGCGCTGT TTTTCGTGCC GTATTTCGAT 
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551 CCGCCCGTTA CCGCGCyGGC GTGGGCGGTC TTTGTCGGCG GCATTTTGCA 

601 ACTCGrmTTC CAACTGCCCT GGCTGGCGAA ACTGGGCTTT TTGAAACTGC 

651 CCAAACtGAG TTTCAAAGAT GCGGCGGTCA ACCGCGTGAT GAAACAGATG 

7 01 GCGCCTGCgA TTTTgGGCGT GAgCGTGGCG CAGGTTTCTT TGGTGATCAA 

7 51 CACGATTTTc GCGTCTTATC TGCAATCGGG CAGCGTTTCA TGGATGTATT 

8 01 ACGCCGACCG CATGATGGAG CTGCCCAGCG GCGTGCTGGG GGCGGCACTC 
851 GGTACGATTT TGCTGCCGAC TTTGTCCAAA CACTCGGCAA ACCaAGATAC 
901 GGaACAGTTT TCCGCCCTGC TCGACTGGGG TTTGCGCCTG TGCATGCtgc 
951 TGACGCTGCC GGCGgcGGTC GGACTGGCGG TGTTGTCGTT cCCgCtGGTG 

1001 GCGACGCTGT TTATGTACCG CGwATTTACG CTGTTTGACG CGCAGATGAC 

1051 GCAACACGCG CTGATTGCCT ATTCTTTCGG TTTAATCGGC TTAATCATGA 

1101 TTAAAGTGTT GGCACCCGGC TTCTATGCGC GGCAAAACAT CAAwAitlGCCC 

1151 GTCAAAATCG CCATCTTCAC GCTCATCTGC mCGCAGTTGA TGAACCTTGs 

1201 CTTTAyCGGC CCACTrrAAC rCa^TCGGAC TTTCGCTTGC CATCGGTCTG 

1251 GGCGCGTGTA TCAATGCCGG ATTGTTGTTT TACCTGTTGC GCAGACACGG 

1301 TATTTACCAA CCTGG . CAAG GGTTGGGCAG CGTTCTT.AG CAAAAATGCT 

1351 GcTCTCGCTC GCCGTGA 

This corresponds to the amino acid sequence <SEQ ID 1 14; ORF20>: 

1 MNMLGALAKV GSLTMVSRVL GFVRDTVIAR AFGAGMAT DA FFVAFKLPNL 

51 LRRVFAEGAF AQAFVPILAE YKETRSKEAX EAFIRHVAGM LSFVLVIVTA 

101 LGILAAPWVI YVSAPSFAQD ADKFQLSIDL LRITFPYILL ISLSSFVGSV 

151 LNSYHKFGIP AFTPXFLNVS FIVFALFFVP YFDPPVTAXA WAVFVGGILQ 

201 LXFQLPWLAK LGFLKLPKLS FKDAAVNRVM KQMAPAILGV SVAQVSLVIN 

251 TIFASYLQSG SVSWMYYADR MMELPSGVLG AALGTILLPT LSKHSANQDT 

3 01 EQFSALLDWG LRLCMLLTLP AAVGLAVLSF PLVATLFMYR XFTLFDAQMT 
351 QHALIAYSFG LIGLIMIKVL APGFYARQNI XXPVKIAIFT LICXQLMNLX 

4 01 FXGPLXXIGL SLAIGLGACI NAGLLFYLLR RHGIYQPXQG LGSVLXQKCC 
451 SRSP* 

These sequences were elaborated, and the complete DNA sequence <SEQ ID 1 15> is: 

1 ATGAATATGC TGGGAGCTTT GGCAAAAGTC GGCAGCCTGA CGATGGTGTC 

51 GCGCGTTTTG GGATTTGTGC GCGATACGGT CATTGCGCGG GCATTCGGCG 

101 CGGGTATGGC GACGGATGCG TTTTTTGTCG CGTTCAAACT GCCCAACCTG 

151 CTTCGCCGCG TGTTTGCGGA GGGGGCGTTT GCCCAAGCGT TTGTGCCGAT 

201 TTTGGCGGAA TACAAGGAAA CGCGTTCAAA AGAGGCGGCG GAGGCTTTTA 

251 TCCGCCATGT GGCGGGGATG CTGTCGTTTG TACTGGTTAT CGTTACCGCG 

301 CTGGGCATAC TTGCCGCGCC TTGGGTGATT TATGTTTCCG CACCCGGTTT 

351 TGCCCAAGAT GCCGACAAAT TTCAGCTCTC CATCGATTTG CTGCGGATTA 

4 01 CGTTTCCTTA TATATTATTG ATTTCCCTGT CTTCATTTGT CGGCTCGGTA 

451 CTCAATTCTT ATCATAAGTT CGGCATTCCG GCGTTTACGC CCACGTTTCT 

501 GAACGTGTCG TTTATCGTAT TCGCGCTGTT TTTCGTGCCG TATTTCGATC 

551 CGCCCGTTAC CGCGCTGGCG TGGGCGGTCT TTGTCGGCGG CATTTTGCAA 

601 CTCGGCTTCC AACTGCCCTG GCTGGCGAAA CTGGGCTTTT TGAAACTGCC 

651 CAAACTGAGT TTCAAAGATG CGGCGGTCAA CCGCGTGATG AAACAGATGG 

7 01 CGCCTGCGAT TTTGGGCGTG AGCGTGGCGC AGGTTTCTTT GGTGATCAAC 
751 ACGATTTTCG CGTCTTATCT GCAATCGGGC AGCGTTTCAT GGATGTATTA 
801 CGCCGACCGC ATGATGGAGC TGCCCAGCGG CGTGCTGGGG GCGGCACTCG 

8 51 GTACGATTTT GCTGCCGACT TTGTCCAAAC ACTCGGCAAA CCAAGATACG 
901 GAACAGTTTT CCGCCCTGCT CGACTGGGGT TTGCGCCTGT GCATGCTGCT 
951 GACGCTGCCG GCGGCGGTCG GACTGGCGGT GTTGTCGTTC CCGCTGGTGG 

1001 CGACGCTGTT TATGTACCGC GAATTTACGC TGTTTGACGC GCAGATGACG 

1051 CAACACGCGC TGATTGCCTA TTCTTTCGGT TTAATCGGCT TAAT CATGAT 

1101 TAAAGTGTTG GCACCCGGCT TCTATGCGCG GCAAAACATC AAAACGCCCG 

1151 TCAAAATCGC CATCTTCACG CTCATCTGCA CGCAGTTGAT GAACCTTGCC 

1201 TTTATCGGCC CACTGAAACA CGTCGGACTT TCGCTTGCCA TCGGTCTGGG 

1251 CGCGTGTATC AATGCCGGAT TGTTGTTTTA CCTGTTGCGC AGACACGGTA 

1301 TTTACCAACC TGGCAAGGGT TGGGCAGCGT TCTTAGCAAA AATGCTGCTC 

1351 TCGCTCGCCG TGATGTGCGG CGGACTGTGG GCAGCGCAGG CTTACCTGCC 

14 01 GTTTGAATGG GCGCACGCCG GCGGAATGCG GAAAGCGGGG CAGCTCTGCA 

1451 TCCTGATTGC CGTCGGCGGC GGACTGTATT TCGCATCACT GGCGGCTTTG 

1501 GGCTTCCGTC CGCGCCATTT CAAACGCGTG GAAAACTGA 

This corresponds to the amino acid sequence <SEQ ID 116; ORF20-1>: 

1 MNMLGALAKV GSLTMVSRVL GFVRDTVIAR AFGAGMAT DA FFVAFKLPNL 

51 LRRVFAEGAF AQAFVPILAE YKETRSKEAA EAFIRHVAG M LSFVLVIVTA 
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101 LGILAA PWVI YVSAPGFAQD ADKFQLSIDL LRIT FPYILL ISLSSFVGSV 

151 LNSYHKFGIP AFTPT FLNVS FIVFALFFVP YF DPP VTALA WAVFVGGILQ 

201 LGFQLPWLAK LGFLKLPKLS FKDAAVNRVM KQ MAPAILGV SVAQVSLVIN 

251 TIFASYLQSG SVSWMYYADR MMELPSGVLG AALGTILLPT LSKHSANQDT 

301 EQFSALLDWG LR LCMLLTLP AAVGLAVLS F PLVATLFMYR EFTLFDAQMT 

351 QH ALIAYSFG LIGLIMIKVL APGFYARQNI KTPV KIAIFT LICTQLMNLA 

4 01 FIGPLKHVGL S LAIGLGACI NAGLLFYL LR RHGIYQPGKG WA AFLAKMLL 

4 51 SLAVMCGGL W AAQAYLPFEW AHAGGMRKAG Q LC1LIAVGG GLYFASLAA L 

501 GFRPRHFKRV EN* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with the MviN virulence factor of S. typhimurium (accession number P37169) 

ORF20 and MviN proteins show 63% aa identity in 440aa overlap: 

Orf20 1 MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 60 

MN+L +LA V S+TM SRVLGF RD ++AR FGAGMATDAFFVAFKLPNLLRR+FAEGAF 
MviN 14 MNLLKSLAAVSSMTMFSRVLGFARDAIVARIFGAGMATDAFFVAFKLPNLLRRIFAEGAF 73 



Orf20 61 AQAFVPILAEYKETRSKEAXEAFIRHVAGMLSFVLVIVTALGILAAPWVIYVSAPSFAQD 120 

+QAFVPILAEYK + +EA F+ +V+G+L+ L +VT G+LAAPWVI V+AP FA 
MviN 74 SQAFVPILAEYKSKQGEEATRIFVAYVSGLLTLALAWTVAGMLAAPWVIMVTAPGFADT 133 

Orf20 121 ADKFQLSIDLLRITFPYILLISLSSFVGSVLNSYHKFGIPAFTPXFLNVS FIVFALFFVP 180 

ADKF L+ LLRITFPYILLISL+S VG++LN++++F IPAF P FLN+S I FALF P 
MviN 134 ADKFALTTQLLRITFPYILLISLASLVGAILNTWNRFSIPAFAPTFLNISMIGFALFAAP 193 

Orf20 181 YFDPPVTAXAWAVFVGGILQLXFQLPWLAKLGFLKLPKLSFKDAAVNRVMKQMAPAILGV 240 

YF+PPV A AWAV VGG+LQL +QLP+L K+G L LP+++F+D RV+KQM PAILGV 
MviN 194 YFNPPVLALAWAVTVGGVLQLVYQLPYLKKIGMLVLPRINFRDTGAMRWKQMG PAILGV 253 



Orf20 241 SVAQVSLVINTIFASYLQSGSVSWMYYADRMMELPSGVLGAALGTILLPTLSKHSANQDT 300 

SV+Q+SL+INTIFAS+L SGSVSWMYYADR+ME PSGVLG ALGTILLP+LSK A+ + 
MviN 254 SVSQISLIINTIFASFLASGSVSWMYYADRLMEFPSGVLGVALGTILLPSLSKSFASGNH 313 



Orf20 301 EQFSALLDWGLRLCMLLTLPAAVGLAVLSFPLVATLFMYRXFTLFDAQMTQHALIAYSFG 3 60 

+++ L+DWGLRLC LL LP+AV L +L+ PL +LF Y FT FDA MTQ AL I AYS G 
MviN 314 DEYCRLMDWGLRLCFLLALPSAVALGILAKPLTVSLFQYGKFTAFDAAMTQRALIAYSVG 37 3 

Orf20 361 LIGLIMIKVLAPGFYARQNIXXPVKIAIFTLICXQLMNLXFXXXXXXXXXXXXXXXXXCI 420 

LIGLI++KVLAPGFY+RQ+I PVKIAI TLI QLMNL F C+ 
MviN 374 LIGLIWKVLAPGFYSRQDIKTPVKIAIVTLIMTQLMNLAFIGPLKHAGLSLSIGLAACL 433 



Orf20 421 NAGLLFYLLRRHGIYQPXQG 44 0 

NA LL++ LR+ 1+ P G 
MviN 434 NASLLYWQLRKQNIFTPQPG 453 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF20 shows 93.5% identity over a 447aa overlap with an ORF (ORF20a) from strain A of N. 
meningitidis: 



10 20 30 40 50 60 

orf 20 . pep MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 
I I I I I ] I : I I M I I I I I I I I I I I I I I I I I I 1 I i I I I I I I I I I I I I M I I I I 1 I I I I I I I I 
o r f 2 0 a MNMLGALVKVGS LTMVS RVLG FVRDT V I ARAFGAGMAT DAFFVAFKLPNLLRRVFAE GAF 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 20 . pep AQAFVPILAEYKETRSKEAXEAFIRHVAG MLSFVLVIVTALGILAA PWVIYVSAPSFAQD 
M I ! I I I I I I I I I I I I I I I : I I I II I I I I I I I I I I I I ! I I I I I I I I I I I i I 11 I I : N : I 

O r f 2 0 a AQAFVP I LAEYKE TRSKEATEAFIRHVAG MLSFVLVIVTALGILAA PWVI YVSAPG FAKD 

70 80 90 100 110 120 
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130 140 150 160 170 180 

orf?0 neD ADKFOLSIDLLRITFPYILLI SLSSFVGSVLN SYHKFGIPAFTPX FLNVSFIVFALFFVP 

P P m 1 1 1 1 1 1 1 1 1 1 1 1 TTTTTTTTTTTTTTTTT 1 1 1 1 1 1 : 1 1 1 1 1 1 : 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 

orf?0a ADKFOLSIDLLRITFPYIL LISLSSFVGSVLN SYHKFSIPAFTPT FLISIVSFIVFALFFVP 
130 140 150~ 160 170 180 

190 200 210 220 230 240 

orf20 pep YFDPPVTAXAWAVF VGGILQLX FQLPWLAKLGFLKLPKLSFKDAAVNRVMKQ MAPAILGV 

TT| | | | | I I M I I I I I I I I I I I I I I I I I I I I I I II I M I I I M I I I I I I I I I I 

orf20a YFDPPVTAL AWAVFVGGILQLG FQLPWLAKLGFLKLPKLSFKDAAVNRVMKQ MAPAILGV 

~~ 190 200 210 220 230 240 

250 260 270 280 290 300 

orf20 pep SVAQVS LV INT I FAS YLQSGSVSWMYYADRMME LP SGVLGAALGT ILLPTLSKHS ANQDT 

||M:||||| | | M I I I I I I I I I I I I I I I I I I I I I : I I I I 1 I 1 I I I I I I I I I I I I I I I I I 
or f 2 0a SVAQISLVINTIFASYLQSGSVSWMYYADRMMELPGGVLGAALGTILLPTLSKHSANQDT 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf2 0 pep EOFSALLDWGLR LCMLLTLPAAVGLAVLS FPLVATLFMYRXFTLFDAQMTQH ALIAYSFG 
1 | | | I I I I I I I I I I I I II I I I I! : I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I 
orf 2 0a EOFSALLDWGLR XCMLLTLPAAVGMAVLS FPLVATLFMYREFTLFDAQMTQH ALIAYSFG 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 20 pep ■ LIGLIMIKVL APGFYARQNIXXPVK IAIFTLICXQLMNLXFX GPLXXIGLS LAIGLGACI 
I I I I I I I I I I I II I I I I I I I : I I I I I I II I I I : I I I I I I I M : I I I I I I I I I I I I 
orf 20a LIGLIMIKVLA PGFYARQNIKTPVK IAI FTLICTQLMNLAFI GPLKHVGLS LAIGLGACI 

370 380 390 400 410 420 

430 440 450 

orf 2 0 . pep NAGLLFYL LRRHGIYQPXQGLGSVLXQKCCSRSPX 

I I I I I I I I I I I II I I I I : I : : I : 
orf 2 0a NAGLLFYL LRRHGIYQPGKGW AAFLAKMLLSLAVMGGGL YAAQIWLPFDWAHAGGMQKAA 

430 440 450 460 470 480 

The complete length ORF20a nucleotide sequence <SEQ ID 1 17> is: 

1 ATGAATATGC TGGGAGCTTT GGTAAAAGTC GGCAGCCTGA CGATGGTGTC 

51 GCGCGTTTTG GGATTTGTGC GCGATACGGT CATTGCGCGC GCATTCGGCG 

101 CAGGCATGGC GACGGATGCG TTCTTTGTCG CGTTCAAACT GCCCAACCTG 

151 CTTCGCCGCG TGTTTGCGGA GGGGGCGTTT GCCCAAGCGT TTGTGCCGAT 

201 TTTGGCGGAA TATAAGGAAA CGCGTTCTAA AGAGGCGACG GAGGCTTTTA 

251 TCCGCCATGT GGCGGGGATG CTGTCGTTTG TACTGGTCAT CGTTACCGCG 

301 CTGGGCATAC TTGCCGCGCC TTGGGTGATT TATGTTTCCG CACCCGGTTT 

351 TGCCAAAGAT GCCGACAAAT TTCAGCTCTC TATCGATTTG CTGCGGATTA 

4 01 CGTTTCCTTA TATCTTATTG ATTTCACTTT CCTCTTTTGT CGGCTCGGTA 

451 CTCAATTCCT AT C AT AAAT T CAGCATTCCT GCGTTTACGC CCACGTTCCT 

501 GAACGTGTCG TTTATCGTAT TCGCGCTGTT TTTCGTGCCG TATTTCGATC 

551 CTCCCGTTAC CGCGCTGGCT TGGGCGGTTT TTGTCGGCGG CATTTTGCAA 

601 CTCGGCTTCC AACTGCCCTG GCTGGCGAAA CTGGGTTTTT TGAAACTGCC 

651 CAAACTGAGT TTCAAAGATG CGGCGGTCAA CCGCGTGATG AAACAGATGG 

7 01 CGCCTGCGAT TTTGGGCGTG AGCGTGGCGC AGATTTCTTT GGTGATCAAC 

751 ACGATTTTCG CGTCTTATCT GCAATCGGGC AGCGTTTCAT GGATGTATTA 

801 CGCCGACCGC ATGATGGAAC TGCCCGGCGG CGTGCTGGGG GCGGCACTCG 

851 GTACGATTTT GCTGCCGACT TTGTCCAAAC ACTCGGCAAA CCAAGATACG 

901 GAACAGTTTT CCGCCCTGCT CGACTGGGGT TTGCGCNTGT GCATGCTGCT 

951 GACGCTGCCG GCGGCGGTCG GAATGGCGGT GTTGTCGTTC CCGCTGGTGG 

1001 CAACCTTGTT TATGTACCGA GAATTCACGC TGTTTGACGC GCAGATGACG 

1051 CAACACGCGC TGATTGCCTA TTCTTTCGGT TTAATCGGTT T AAT CAT GAT 

1101 TAAAGTGTTG GCGCCCGGCT TTTATGCGCG GCAAAACATC AAAACGCCCG 

1151 TCAAAATCGC CATCTTCACG CTCATTTGCA CGCAGTTGAT GAACCTTGCC 

1201 TTTATCGGCC CACTGAAACA CGTCGGACTT TCGCTTGCCA TCGGTCTGGG 

1251 CGCGTGTATC AATGCCGGAT TGTTGTTTTA CCTGTTGCGC AGACACGGTA 

1301 TTTACCAACC TGGCAAGGGT TGGGCAGCGT TCTTGGCAAA AATGCTGCTC 

1351 TCGCTCGCCG TGATGGGAGG CGGCCTGTAT GCCGCCCAAA TCTGGCTGCC 

1401 GTTCGACTGG GCACACGCCG GCGGAATGCA AAAGGCCGCC CGGCTCTTCA 

1451 TCCTGATTGC CGTCGGCGGC GGACTGTATT TCGCATCACT GGCGGCTTTG 

1501 GGCTTCCGTC CGCGCCATTT CAAACGCGTG GAAAGCTGA 
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This encodes a protein having amino acid sequence <SEQ ID 1 18>: 



101 
151 
201 
251 
301 
351 
401 



MNMLGALVKV 
LRRVFAE GAF 
LGILAA PWVI 
LNSYHKFSIP 
LGFQLPWLAK 
TIFASYLQSG 
EQFSALLDWG 
QHA LIAYSFG 



GSLTMVSRVL 
AQAFVPILAE 
YVSAPGFAKD 
AFTPTFLNVS 



GFVRDTVIAR 
YKETRSKEAT 
ADKFQLSIDL 
FIVFALFFVP 



LGFLKLPKLS 
SVSWMYYADR 
LRXCMLLTLP 



FKDAAVNRVM 
MMELPGGVLG 
AAVGMAVLSF 



LIGLIMIKVL i 



FIGPLKHVGL S LAIGLGACI NAGLLFYL LR 
SLAVMGGGL Y AAQIWLPFDW AHAGGMQKAA 
GFRPRHFKRV ES* 



AFGAGMATDA FFVAFKLPNL 
EAFIRHVAG M LSFVLVIVTA 
LRIT FPYILL ISLSSFVGSV 
YFDPP VTALA WAVFVGGILQ 
KQ MAPAILGV SVAQISLVIN 
AALGTILLPT LSKHSANQDT 
PLVATLFMYR EFTLFDAQMT 
KT PVK IAIFT LICTQLMNLA 
RHGIYQPGKG W AAFLAKMLL 
RLFILIAVGG GLYFASLAAL 



ORF20a and ORF20-1 show 96.5% identity in 512 aa overlap: 



40 



orf 20a. pep 
orf20-l 



MNMLGALVKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 
I | | | I I ! : I I I I I I I I [ I I I I I I I I I II I 1 I I I I I i I i I I I I I I I I I 1 I I I I I 1 I I I I I I 
MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 



10 



20 



30 



40 



50 



60 



70 80 90 100 110 120 

AQAFVPILAE YKETRSKEATEAFIRHVAGMLSFVLVIVTALGILAAPWVIYVSAPGFAKD 

I | 1 I I I I I II I I I I I II I I : I II I I I I I I II I II I I I I II I I I I I I I I I I I I I I I I I I : I 
AQAFVPILAEYKETRSKEAAEAFIRHVAGMLSFVLVIVTALGILAAPWVIYVSAPGFAQD 

70 80 90 100 110 120 

130 140 150 160 170 180 

ADKFQLS I DLLRITFPYILLISLSSFVGSVLNSYHKFS I PAFTPTFLNVS FIVFALFFVP 

II I II I II I I I I I I I II I I II I I I I I I I I I I I I I I II : M II II I I I I I I I I I I II I I I I 

ADKFQLS IDLLRITFPYILLISLSSFVGSVLNSYHKFGI PAFTPTFLNVS FIVFALFFVP 
130 140 150 160 170 180 

190 200 210 220 230 240 

YFDPPVTALAWAVFVGGILQLGFQLPWLAKLGFLKLPKLSFKDAAVNRVMKQMAPAILGV 
I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I M I II I I I I I 
Y FD PPVT ALAWAV FVGGI LQLGFQLPWLAKLG FLKL PKLS FKDAAVNRVMKQMAPAI LGV 

190 200 210 220 230 240 

250 260 270 280 290 300 

SVAQISLVINTIFASYLQSGSVSWMYYADRMMELPGGVLGAALGTILLPTLSKHSANQDT 
I I I I : I I I I I II I I I I I I II I I I I I I I I I I I I I I I : I I I I I I I I I I II M I I II I I I I I I 
SVAQVSLVINTIFASYLQSGSVSWMYYADRMMELPSGVLGAALGTILLPTLSKHSANQDT 

250 260 270 280 290 300 

310 320 330 340 350 360 

EQ FS ALLDWGLRXCMLLTL PAAVGMAVLS FPLVATL FMYREFTL FDAQMTQHAL I AYS FG 
M I ! I I I I I I I I I I I I I I I I I 1 I : I I I I I II I I II I I I I I I I I I I 11 I 1 I I I I I I I I I I 
EQFS ALLDWGLRLCMLLTL PAAVGLAVLS FPLVATL FMYREFTLFDAQMTQHAL I AYS FG 

310 320 330 340 350 360 

370 380 390 400 410 420 

LIGLIMIKVLAPGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHVGLSLAIGLGACI 
I I ] I I II II I I I I I I I I I II I I I I I I 1 I I I I I I I I I I II I I I I I I I I I II I I I I I I I I II 
LIGLIMIKVLAPGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHVGLSLAIGLGACI 

370 380 390 400 410 420 

430 440 450 460 470 480 

NAGLL FYLLRRHG I YQPGKGWAAFLAKMLL S LAVMGGGL YAAQI WL P FDWAHAGGMQKAA 

I M II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I : I II : I I I I II ] : I I : 
NAGLLFYLLRRHGIYQPGKGWAAFLAKMLLSLAVMCGGLWAAQAYLPFEWAHAGGMRKAG 
430 440 450 460 470 480 

490 500 510 

RL FI L I AVGGGLY FAS LAALG FRPRHFKRVE SX 
: I I I 1 I i I I I I I I I II I I I I I I I I I I I I I I : I 
QLCILIAVGGGLYFASLAALGFRPRHFKRVENX 

490 500 510 
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Homology with a predicted QRF from N. gonorrhoeae 

ORF20 shows 92.1% identity over a 454aa overlap with a predicted ORF (ORF20ng) from N. 
gonorrhoeae: 

orf20 pep MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 60 

I I I I I I I I I I I I I I I ! I I I I I I I II I I I II I I II I I I I I I I I I I I I I I I I I I I I I I M I I 
or f 2 Ong MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 60 

orf20 pep AQAFVPILAEYKETRSKEAXEAFIRHVAGMLSFVLVIVTALGILAAPWVIYVSAPSFAQD 120 

| | I I I I M I I I M I I I I I I : I I I I I I I I I I I I I I I :: I I I I I I I I I I I I I I I I I 1 : I I 
or f 2 Ong AQAFVPILAEYKETRSKEATEAFIRHVAGMLSFVLIWTALGILAAPWVIYVSAPGFTKD 120 

orf20 pep ADKFQLSIDLLRITFPYILLISLSSFVGSVLNSYHKFGIPAFTPXFLNVSFIVFALFFVP 180 

I I I I I I I I : I I I I I I I i M I I I I I I I I I I : I I I I I II I I II II I : I I I : I I I I I I I I II I 
orf2 0ng ADKFQLSISLLRITFPYILLISLSSFVGSILNSYHKFGIPAFTPTFLNISFIVFALFFVP 180 

orf20 pep YFDPPVTAXAWAVFVGGILQLXFQLPWLAKLGFLKLPKLSFKDAAVNRVMKQMAPAILGV 240 

| | || ! | | | | | I 1 II I I I I I I I I I II I I I I I I I I I I I I : I I I I I I I I I I I I I M I I I I I 
orf 2 Ong YFDPPVTALAWAVFVGGILQLGFQLPWLAKLGFLKLPKLNFKDAAVNRVMKQMAPAILGV 24 0 

orf20 .pep SVAQVSLVINTIFASYLQSGSVSWMYYADRMMELPSGVLGAALGTILLPTLSKHSANQDT 300 

I I M : M M II I I I I I M I I I II I I I I I I II I I I I : M II I I I I I M I I I I I I I I I I I I I 
orf20ng SVAQISLVINTIFASYLQSGSVSWMYYADRMMELPGGVLGAALGTILLPTLSKHSANQDT 300 

orf 20 . pep EQFSALLDWGLRLCMLLTLPAAVGLAVLSFPLVATLFMYRXFTLFDAQMTQHALIAYSFG 3 60 

I I I I I II I I I I I I I I I I I I I I I : I I I I I II I I I I I 1 II II I I I I I I I I M I I I I M I I I 
orf20ng EQFSALLDWGLRLCMLLTLPAAAGLAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 3 60 

orf20 .pep LIGLIMIKVLAPGFYARQNIXXPVKIAIFTLICXQLMNLXFXGPLXXIGLSLAIGLGACI 420 

orf20ng LIGLIMIKVLASGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHAGLSLAIGLGACI 420 

orf 20. pep NAGLLFYLLRRHGIYQPXQGLGSVLXQKCCSRSP 454 

I I I I I I : I : I : I II I : I Mil: : I I I I I I I 
orf 2 Ong NAGLLFFLFRKHGIYRPGQGLGQPSWRKCCSRSP 454 

An ORF20ng nucleotide sequence <SEQ ID 1 19> was predicted to encode a protein having amino 
acid sequence <SEQ ID 120>: 



101 

151 
201 
251 
301 
351 
401 



MNMLGALAKV 
LRRVFAEGAF 
LGILAAPWVI 
LNSYHKFGIP 
LGFQLPWLAK 
TIFASYLQSG 
EQFSALLDWG 
QHALIAYSFG 
FIGPLKHAGL 
SRSP* 



GSLTMVSRVL 
AQAFVPILAE 
YVSAPGFTKD 
AFTPTFLNIS 
LGFLKLPKLN 
SVSWMYYADR 
LRLCMLLTLP 
LIGLIMIKVL 
SLAIGLGACI 



GFVRDTVIAR 
YKETRSKEAT 
ADKFQLSISL 
FIVFALFFVP 
FKDAAVNRVM 
MMELPGGVLG 
AAAGLAVLSF 
ASGFYARQNI 
NAGLLFFLFR 



AFGAGMAT DA 
EAFIRHVAGM 
LRITFPYILL 
YFDPPVTALA 
KQMAPAILGV 
AALGTILLPT 
PLVATLFMYR 
KTPVKIAIFT 
KHGIYRPGQG 



FFVAFKLPNL 
LSFVLIWTA 
ISLSSFVGSI 
WAVFVGGILQ 
SVAQISLVIN 
LSKHSANQDT 
EFTLFDAQMT 
LICTQIMNLA 
LGQPSWRKCC 



Further DNA sequence analysis revealed the following DNA sequence <SEQ ID 121>: 



101 
151 
201 
251 
301 
351 
401 
451 
501 
551 



ATGAATATGC 
GCGCGTTTTG 
CGGGTATGGC 
CTTCGCCGCG 
TTTGGCGGAA 
TCCGCCACGt 
CTGGGCATAC 
TACCAAAGAC 
CGTTTCCTTA 
CTCAATTCCT 
AAACATCTCT 
CGCCCGTTAC 



TTGGAGCTTT 
GGATTTGTGC 
GACGGATGCG 
TGTTTGCGGA 
TATAAGGAAA 
tgcgggAatg 
TTGCCGCgcc 
GCGGACAAGT 
TATATTATTG 
ACCATAAGTT 
TTTATCGTAT 

CGCGCTGGCG 



GGCAAAAGTC 
GCGATACGGT 
TTTTTTGTCG 
GGGGGCGTTT 
CGCGTTCTAA 
CTGTCGTTTG 
tTGGGTGATT 
TCCAACTTTC 
ATTTCTTTGT 
CGGCATTCCC 
TCGCACTGTT 
TGGGCGGTTT 



GGCAGCCTGA 
CATTGCGCGG 
CGTTCAAACT 
GCCCAAGCGT 
AGAGGCGAcg 
TGCTGATcgt 
TATGTTtccg 
CATCAGCCTG 
CTTCTTTTGT 
GCGTTTACGC 
TTTCGTGCCG 
TTGTCGGCGG 



CGATGGTGTC 
GCATTCGGCG 
GCCCAACCTG 
TTGTGCCGAT 
gAGGCTTTTA 
cGttacCGCG 
CgcccGGCTT 
CTGCGGATTA 
CGGCTCGATA 
CCACGTTTTT 
TATTTCGATC 
TATTTTGCAG 
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601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 



CTCGGTTTCC 
CAAACTGAAT 
CGCCTGCGAT 
ACGATTTTCG 
cgCCGACCGC 
GTACAATTTT 
GAACAGTTTT 
GACGCTGCCG 
CGACGCTGTT 
CAACACGCGC 
TAAAGTGTTG 
TCAAAATCGC 
TTTATCGGTC 
CGCGTGCATC 
TTTACCGGCC 
GCGCTCGCCG 
GTTCGAATGG 
TCCTGATTGC 
GGCTTCCGTC 



AACTGCCGTG 
TTCAAAGATG 
TTTGGGCGTG 
CGTCTTATCT 
ATGATGGAGc 
GCTGCCGACT 
CCGCCCTGCT 
GCGGCGGccg 
TATGTACCGA 
TGATTGCCTA 
GCATCCGGCT 
CATCTTCACG 
CGTTGAAACA 
AACGCCGGAT 
cggcaggggt 
TGATGTGCGG 
GCGCACGCCG 
CGTCGGCGGC 
CGCGCCATTT 



GCTGGCGAAA 
CGGCGGTCAA 
agcgTGGCGC 
GCAATCGGGC 
tgcgccGGGG 
TTGTCCAAAC 
CGACTGGGGT 
GACTGGCGGT 
GAATTCACGC 

TTCTTTCGGT 
TTTATGCGCG 
CTCATCTGCA 
CGCCGGGCTT 
TGTTGTTCTT 
tgggcggcgt 
CGGACTGTGG 
GCGGAATGCG 
GGACTGTATT 
CAAACGCGTG 



CTGGGCTTTT 
CCGCGTCATG 
AAATTTCTTT 
AGCGTTTCAT 
CGTGCTGGGG 
ACT CGGCAAA 
TTGCGCCTGT 
ATTGTCGTTC 

TGTTTGACGC 

TTAATCGGTT 
G CAAAAC AT C 
CGCAGTTGAT 
TCGCTCGCCA 
CCTGTTGCGC 
TCTTGGCGAA 
GCGGCGCAGG 
GAAAGCGGGG 
TCGCATCTCT 
GAAAGCTGA 



TGAAACTGCC 
AAACAGATGG 
GgttATCAAC 
GGATGTatta 
GCTGCACTCG 
CCAAGATACG 
GCATGCTGCT 
CCGCTGGTGG 
ACAAATGACG 

TAATTATGAT 
AAAACGCCCG 
GAACCTCGCC 
TCGGCCTGGG 
AAACACGGTA 
AATGCTGCTC 
CTTGCCTGCC 
CAGCTCTGCA 
GGCGGCTTTG 



20 This encodes the following amino acid sequence <SEQ ID 122; ORF20ng-l>: 



25 



101 
151 
201 
251 
301 
351 
401 
451 
501 



MNMLGALAKV 
LRRVFAEGAF 
LGILAAPWVI 
LNSYHKFGIP 
LGFQLPWLAK 
TIFASYLQSG 
EQFSALLDWG 
QH ALIAYSFG 



GSLTMVSRVL 
AQAFVPILAE 
YVSAPGFTKD 
AFTPTFLNIS 



GFVRDTVIAR 
YKETRSKEAT 
ADKFQLSISL 
FIVFALFFVP 



LGFLKLPKLN 
SVSWMYYADR 
LRLCMLLTLP 



FKDAAVNRVM 
MME LRRGVLG 
AAAGLAVLSF 



LIGLIMIKVL 2 



FIGPLKHAGL S LAIGLGACI NAGLLFFL LR 
ALAVMCGGLW AAQAC L P FEW AHAGGMRKAG 
GFRPRHFKRV ES* 



AFGAGMATDA FFVAFKLPNL 
EAFIRHVAG M LSFVLIWTA 
LRIT FPYILL ISLSSFVGSI 
YFDPP VTALA WAVFVGGILQ 
KQ MAPAILGV SVAQISLVIN 
AALGTILLPT LSKHSANQDT 
PLVATLFMYR EFTLFDAQMT 
KT PVK IAIFT LICTQLMNLA 
KHGIYRPGRG WAAFLAKMLL 
Q LCILIAVGG GLYFASLAA L 



ORF20ng-l and ORF20-1 show 95.7% identity in 512 aa overlap: 



orf 20-1 .pep 
orf20ng-l 



orf20-l .pep 
orf 20ng-l 



orf20-l.pep 
orf20ng-l 



rf20-l.pep 
rf20ng-l 



orf 20-1 .pep 
orf20ng-l 



■rf 20-1. pep 
rf20ng-l 



MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I i I I I I I I II I I I I I I I 
MNMLGALAECVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 



10 



20 



30 



40 



50 



60 



70 80 90 100 110 120 

AQAFVPILAEYKETRSKEAAEAFIRHVAGMLSFVLVIVTALGILAAPWVIYVSAPGFAQD 

I I I I I I I I I I I M I I I I I I : I I I I I I I I I I II I I I :: I I I I I I II 1 I I M I I II I I I :: I 

AQAFVP I LAEYKETRSKEATEAFIRHVAGMLSFVLIWTALGILAAPWVI YVSAPGFTKD 
70 80 90 100 110 120 

130 140 150 160 170 180 

ADKFQLSIDLLRITFPYILLISLSSFVGSVLNSYHKFGIPAFTPTFLNVS FIVFALFFVP 

I i M I I I I : I I I I I I I I I I I I I I M I I I I : I M I M I I I I I I I I I I I I : I I I I I I I I I I! 
ADKFQLS I SLLRITFPYILLISLSSFVGS I LNSYHKFGIPAFTPTFLN IS FIVFALFFVP 
130 140 150 160 170 180 

190 200 210 220 230 240 

YFDPPVTALAWAVFVGGILQLGFQLPWLAKLGFLKLPKLSFKDAAVNRVMKQMAPAILGV 
I | I I I I I II I I I I I I II I I I I I I I I I 1 I I I I I II I I I I I : I I I I I I I I I I I I I I I I I I I I 
YFDPPVTALAWAVFVGGILQLGFQLPWLAKLGFLKLPKLNFKDAAVNRVMKQMAPAILGV 

190 200 210 220 230 240 

250 260 270 280 290 300 

SVAQVSLVINT I FAS YLQSGSVSWMYYADRMMELPSGVLGAALGT I LLPT LSKHSANQDT 
I I I I : 1 I 1 I I I I I I I I M I I I II I I I I I I I I I I I I I I I I II I II II I I I I I I I I I I I I 
SVAQ I SLV INT I FAS YLQSGSVSWMYYADRMMELRRGVLGAALGT I LLPT LSKHSANQDT 

250 260 270 280 290 300 

310 320 330 340 350 360 

EQFSALLDWGLRLCMLLTLPAAVGLAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 
I I I I I I I I I I I ! I I I I I I I I I I : I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I 
EQFSALLDWGLRLCMLLTLPAAAGLAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 
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orf 20-1 .pep 



370 380 390 400 410 420 

LIGLIMIKVLAPGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHVGLSLAIGLGACI 
| | | | ] M I ! I I I j I 1 I 1 I I I I I I I I I I I 1 I I I I I I I I 1 I 1 I 1 I I I I : I I I I I I I I I ! II 
orf20nq-l LIGLIMIKVLASGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHAGLSLAIGLGACI 
370 380 390 400 410 420 

430 440 450 460 470 480 

10 orf 2 0-1 pep NAGLLFYLLRRHGIYQPGKGWAAFLAKMLLSLAVMCGGLWAAQAYLPFEWAHAGGMRKAG 

| | | | | | : I I I : I I I I : II : I I I I I I I I I I I : I I 11 M I I M M I I I I II I I M I I I I I I 
or f 2 Ong-l NAGLLFFLLRKHGIYRPGRGWAAFLAKMLLALAVMCGGLWAAQACLPFEWAHAGGMRKAG 
430 440 450 460 470 480 

15 490 500 510 

orf 20-1 . pep QLCILIAVGGGLYFASLAALGFRPRHFKRVENX 
I I I I I I I I I I I II I I I I I I I I I I I I I I I I II = I 
orf 2 Ong-l QLCILIAVGGGLYFASLAALGFRPRHFKRVESX 
490 500 510 

20 In addition, ORF20ng-l shows significant homology with a virulence factor of S. typhimurium: 

sp I P37169 |MVIN_SALTY VIRULENCE FACTOR MVIN pir||S40271 mviN protein - Salmonella 
typhimurium gi 1438252 (Z26133) mviB gene product [Salmonella typhimurium] 
gnl|PID|dl005521 (D25292) ORF2 [Salmonella typhimurium] Length = 524 
Score = 1573 (750.1 bits), Expect = l.le-220, Sum P(2) = l.le-220 
25 Identities = 309/467 (66%) , Positives = 368/467 (78%) 



45 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sb j ct : 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 



1 MNMLGALAKVGSLTMVSRVLGFVRDTVIARAFGAGMATDAFFVAFKLPNLLRRVFAEGAF 60 
MN+L +LA V S+TM SB.VLGF RD ++AR FGAGMATDAFFVAFKLPNLLRR+FAEGAF 
14 MNLLKSLAAVSSMTMFSRVLGFARDAIVARIFGAGMATDAFFVAFKLPNLLRRIFAEGAF 73 

61 AQAFVPILAEYKETRSKEATEAFIRHVAGMLSFVLIVVTALGILAAPWVIYVSAPGFTKD 120 

+QAFVPILAEYK + +EAT F+ +V+G+L+ L WT G+LAAPWVI V+APGF 
7 4 SQAFVPILAEYKSKQGEEATRIFVAYVSGLLTLALAWTVAGMLAAPWVIMVTAPGFADT 133 

121 ADKFQLSISLLRITFPYILLISLSSFVGSILNSYHKFGIPAFTPTFLNISFIVFALFFVP 180 

ADKF L+ LLRITFPYILLISL+S VG+ILN++++F IPAF PTFLNIS I FALF P 
134 ADKFALTTQLLRITFPYILLISLASLVGAILNTWNRFSIPAFAPTFLNISMIGFALFAAP 193 

181 YFDPPVTALAWAVFVGGILQLGFQLPWLAKLGFLKLPKLNFKDAAVNRVMKQMAPAILGV 24 0 

YF+PPV ALAWAV VGG+LQL +QLP+L K+G L LP++NF+D RV+KQM PAILGV 
194 YFNPPVLALAWAVTVGGVLQLVYQLPYLKKIGMLVLPRINFRDTGAMRWKQMG PAILGV 253 

241 SVAQISLVINTIFASYLQSGSVSWMYYADRMMELRRGVLGAALGTILLPTLSKHSANQDT 300 

SV+QISL+INTIFAS+L SGSVSWMYYADR+ME GVLG ALGTILLP+LSK A+ + 
254 SVSQISLIINTIFAS FLASGSVSWMYYADRLMEFPSGVLGVALGTILLPSLSKSFASGNH 313 

301 EQFSALLDWGLRLCMLLTLPAAAGLAVLSFPLVATLFMYREFTLFDAQMTQHALIAYSFG 3 60 

+++ L+DWGLRLC LL LP+A L +L+ PL +LF Y +FT FDA MTQ ALIAYS G 
314 DEYCRLMDWGLRLCFLLALPSAVALGILAKPLTVSLFQYGKFTAFDAAMTQRALIAYSVG 37 3 

3 61 LIGLIMIKVLASGFYARQNIKTPVKIAIFTLICTQLMNLAFIGPLKHAGLSLAIGLGACI 420 

LIGLI++KVLA GFY+RQ+ IKT PVK IAI TLI TQLMNLAFIGPLKHAGLSL+IGL AC+ 
374 LIGLIWKVLAPGFYSRQDIKTPVKIAIVTLIMTQLMNLAFIGPLKHAGLSLSIGLAACL 433 

421 NAGLLFFLLRKHGIYRPGRGWXXXXXXXXXXXXVMCGGLWAAQACLP 4 67 

NA LL++ LRK 1+ P GW VM L+ +P 

434 NASLLYWQLRKQNIFTPQPGWMWFLMRLIISVLVMAAVLFGVLHIMP 480 



70 (33.4 bits), Expect = 
ies = 14/41 (34%), Positi' 



-220, Sum P(2) 
23/41 (56%) 



Query: 469 EWAHAGGMRKAGQLCILIAVGGGLYFASLAALGFRPRHFKR 50 9 

EW+- + + +L ++ G YFA+LA LGF+ + F R 
Sbjct: 481 EWSQGSMLWRLLRLMAVVIAGIAAYFAALAVLGFKVKEFVR 521 
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Based on this analysis, including the homology with a virulence factor from S.typhimurium, it is 
predicted that these proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 15 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 123>: 

1 atGATTAAAA TCAAAAAAGG TCTAAACCTG CCCATCGCGG GCAGACCGGA 

51 GCAAGCCGTT tACGACGGCC CGGCCaTTAC CGAAGtCGCG TTGCTTGGCG 

101 AAGAATATGC CGGTATGCGC CCCTCGATGA AAGTCAAGGA AGGCGATGCC 

151 GTcAAAAAAG GCCAAGTGCT GTTTGAAGAC AAAAAGAATC CGGGCGTGGT 

201 GTTTACTGCG CCGGCTTCAG GcAAAATCGC CGCGATTCAC CGTGGCGAAA 

2 51 AGCGCGTACT TCAGTCAGTC GTGATTGCCG TTGAArGCAA CGACGAAATC 

301 GAGTTTGAAC GCTACGCACC TGAAGCGCTG GCAAACTTAA GCGGCGAAGA 

351 AGTGCGCCGC AACCTGATCC AATCCGGTTT GTGGACTGCG CTGCGCACCC 

4 01 GTCCGTTCAG CAAAATTCCT GCCGTCGATG CCGAGCCGTT CGCCATCTTC 

451 GTCAATGCGA tGGACACCAA TCCG. . 

This corresponds to the amino acid sequence <SEQ ID 124; ORF22>: 

1 MIKIKKGLNL PIAGRPEQAV YDGPAITEVA LLGEEYAGMR PSMKVKEGDA 
51 VKKGQVLFED KKNPGWFTA PASGKIAAIH RGEKRVLQSV VIAVEXNDEI 
101 EFERYAPEAL ANLSGEEVRR NLIQSGLWTA LRTRPFSKIP AVDAEPFAIF 
151 VNAMDTNP. . 

Further work revealed the complete nucleotide sequence <SEQ ID 125>: 

1 ATGATTAAAA TCAAAAAAGG TCTAAACCTG CCCATCGCGG GCAGACCGGA 

51 GCAAGCCGTT TACGACGGCC CGGCCATTAC CGAAGTCGCG TTGCTTGGCG 

101 AAGAATATGC CGGTATGCGC CCCTCGATGA AAGTCAAGGA AGGCGATGCC 

151 GTCAAAAAAG GCCAAGTGCT GTTTGAAGAC AAAAAGAATC CGGGCGTGGT 

201 GTTTACTGCG CCGGCTTCAG GCAAAATCGC CGCGATTCAC CGTGGCGAAA 

251 AGCGCGTACT TCAGTCAGTC GTGATTGCCG TTGAAGGCAA CGACGAAATC 

301 GAGTTTGAAC GCTACGCACC TGAAGCGCTG GCAAACTTAA GCGGCGAAGA 

351 AGTGCGCCGC AACCTGATCC AATCCGGTTT GTGGACTGCG CTGCGCACCC 

401 GTCCGTTCAG CAAAATTCCT GCCGTCGATG CCGAGCCGTT CGCCATCTTC 

451 GTCAATGCGA TGGACACCAA TCCGCTGGCT GCCGACCCTA CGGTCATTAT 

501 CAAAGAAGCC GCCGAGGATT TCAAACGCGG CCTGTTGGTA TTGAGCCGTT 

551 TGACCGAACG CAAAATCCAT GTTTGTAAGG CAGCTGGCGC AGACGTGCCG 

601 TCTGAAAATG CTGCCAACAT CGAAACACAT GAATTCGGCG GCCCGCATCC 

651 TGCCGGTTTG AGTGGCACGC ACAT T CAT T T CATCGAGCCG GTCGGCGCGA 

701 ATAAAACCGT GTGGACCATC AATTATCAAG ATGTAATTAC CATTGGCCGT 

751 TTGTTTGCAA CAGGCCGTCT GAACACCGAG CGCGTGATTG CCCTAGGTGG 

801 TTCTCAAGTC AACAAACCGC GCCTCTTGCG TACCGTTTTG GGTGCGAAAG 

851 TATCGCAAAT TACTGCGGGC GAATTGGTTG ACACAGACAA CCGCGTGATT 

901 TCCGGTTCGG TATTGAACGG CGCGATTACA CAAGGCGCGC ACGATTATTT 

951 GGGACGCTAC CACAATCAGA TTTCCGTTAT CGAAGAAGGC CGCAGCAAAG 

1001 AGCTGTTCGG CTGGGTTGCG CCGCAGCCGG ACAAATACTC CAT CACGCGT 

1051 ACAACCCTCG GCCATTTCCT GAAAAACAAA CTCTTCAAGT TCAACACAGC 

1101 CGTCAACGGC GGCGACCGCG CCATGGTGCC GATTGGTACT TACGAGCGCG 

1151 TGATGCCCTT GGATATCCTG CCCACCCTGC TTTTGCGCGA TTTAATCGTC 

1201 GGCGATACCG ACAGCGCGCA GGCATTGGGT TGCTTGGAAT TGGACGAAGA 

1251 AGACCTCGCT TTGTGCAGCT TCGTCTGCCC GGGCAAATAC GAATACGGCC 

1301 CGCTGTTGCG CAAAGTGCTG GAAACCATTG AGAAGGAAGG CTGA 

This corresponds to the amino acid sequence <SEQ ID 126; ORF22-l>: 

1 MIKIKKGLNL PIAGRPEQAV YDGPAITEVA LLGEEYAGMR PSMKVKEGDA 
*i VKKGQVLFED KKNPGWFTA PASGKIAAIH RGEKRVLQSV VIAVEGNDEI 
EFERYAPEAL ANLSGEEVRR NLIQSGLWTA LRTRPFSKIP AVDAEPFAIF 
VNAMDTNPLA ADPTVIIKEA AEDFKRGLLV LSRLTERKIH VCKAAGADVP 
201 SENAANIETH EFGGPHPAGL SGTHIHFIEP VGANKTVWTI NYQDVITIGR 



151 



251 



LFATGRLNTE RVIALGGSQV NKPRLLRTVL GAKVSQITAG ELVDTDNRVI 
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301 SGSVLNGAIT QGAHDYLGRY HNQISVIEEG RSKELFGWVA PQPDKYSITR 
351 TTLGHFLKNK LFKFNTAVNG GDRAMVPIGT YERVMPLDIL PTLLLRDLIV 
401 GDTDSAQALG CLELDEEDLA LCSFVCPGKY EYGPLLRKVL ETIEKEG* 

Further work identified the corresponding gene in strain A of TV. meningitidis <SEQ ID 127>: 



1 ATGATTAAAA TCAAAAAAGG TCTAAACCTG CCCATCGCGG GCAGACCGGA 

51 GCAAGTCATT TATGACGGGC CCGTCATTAC CGAAGTCGCG TTGCTTGGCG 

101 AAGAATATGC CGGTATGCGC CCCTNGATGA AAGTCAAGGA AGGCGATGCC 

151 GTCAAAAAAG GCCAAGTGCT GTTTGAAGAC AAAAAGNATC CGGGCGTGGT 

2 01 GTTTACCGCG CCNGTTTCAG GCAAAATCGC CGCCATCCAT CGCGGCGAAA 

251 AGCGCGTACT TCAGTCGGTC GTGATTGCCG TTGAAGGCAA CGACGAAATC 

301 GAGTTCGAAC GCTACGCGCC CGAAGCGTTG GCAAACTTAA GCGGCGANGA 

351 ANTNNGNNGC AATCTGATCC AATCCGGTTT GTGGACTGCG CTGCGTANCC 

401 GTCCGTTCAG CAAAATCCCT GCCGTCGATG CCGAGCCGTT CGCCATCTTC 

451 GTCAATGCGA TGGACACCAA TCCGCTNGCG GCAGACCCTG TGGTTGTGAT 

501 CAAAGAAGCC GNCGANGATT TCAGACGANG TNTGCTGGTA TTGAGCCGTT 

551 TGACCGAGCG T AAAAT C CAT GTGTGTAAGG CAGCTGGCGC AGACGTGCCG 

601 TCTGAAAATG CTGCCAACAT CGAAACACAT GAATTCGGCG GCCCGCATCC 

651 GGCCGGTTTG AGTGGCACGC ACATTCATTT CATTGAGCCG GTCGGTGCAA 

701 ACAAAACCGT TTGGACCATC AATTATCAAG ATGTAATTGC CATCGGACGT 

751 TTGTTTGCAA CAGGCCGTCT GAACACCGAG CGCGTGATTG CTTTGGGTGG 

801 TTCTCAAGTC AACAAACCAC GCCTCTTGCG TACCGTTTTG GGTGCGAAAG 

851 TATCGCAAAT TACTGCGGGC GAATTGGTTG ACGCAGACAA CCGCGTGATT 

901 TCCGGTTCGG TATTGAACGG CGCGATTACA CAAGGCGCGC ACGATTATTT 

951 GGGACGCTAC CACAATCAGA TTTCCGTTAT CGAAGAAGGC CGCAGCAAAG 

1001 AGCTGTTCGG CTGGGTTGCG CCGCAGCCGG AC AAAT ACT C CATCACGCGT 

1051 ACGACCCTCG GCCATTTCCT GAAAAACAAA CTCTTCAAGT TCACGACAGC 

1101 CGTCAACGGT GGCGACCGCG CCATGGTGCC GATTGGTACT TACGAGCGCG 

1151 TAATGCCGCT AGACATCCTG CCTACCCTGC TTTTGCGCGA TTTAATCGTC 

1201 GGCGATACCG ACAGCGCGCA AGCATTGGGT TGCTTGGAAT TGGACGAAGA 

1251 AGACCTCGCT TTGTGCAGCT TCGTCTGCCC GGGCAAATAC GAATANGGCC 

1301 CGCTGTTGCG TAAGGTGCTG GAAACCNTTG AGAAGGAAGG CTGA 

This encodes a protein having amino acid sequence <SEQ ID 128; ORF22a>: 



1 MIKIKKGLNL PIAGRPEQVI YDGPVITEVA LLGEEYAGMR PXMKVKEGDA 

51 VKKGQVLFED KKXPGWFTA PVSGKIAAIH RGEKRVLQSV VIAVEGNDEI 

101 EFERYAPEAL ANLSGXEXXX NLIQSGLWTA LRXRPFSKIP AVDAEPFAIF 

151 VNAMDTNPLA ADPVWIKEA XXDFRRXXLV LSRLTERKIH VCKAAGADVP 

201 SENAANIETH EFGGPHPAGL SGTHIHFIEP VGANKTVWTI NYQDVIAIGR 

251 LFATGRLNTE RVIALGGSQV NKPRLLRTVL GAKVSQITAG ELVDADNRVI 

301 SGSVLNGAIT QGAHDYLGRY HNQISVIEEG RSKELFGWVA PQPDKYSITR 

351 TTLGHFLKNK LFKFTTAVNG GDRAMVPIGT YERVMPLDIL PTLLLRDLIV 

401 GDTDSAQALG CLELDEEDLA LCSFVCPGKY EXGPLLRKVL ETXEKEG* 

The originally-identified partial strain B sequence (ORF22) shows 94.2% identity over a 158aa 
overlap with ORF22a: 



orf 22 . pep MIKIKKGLNLPIAGRPEQAVYDGPAITEVALLGEEYAGMRPSMKVKEGDAVKKGQVLFED 

I 1 I I I 1 M I I M I I I I I I : : I I II : I I I I! I I I I I I I I I I I I I I M I I I I I I I I I I I I I 

orf 22a MIKIKKGLNL PIAGRPEQVI YDGPVITEVALLGEEYAGMRPXMKVKEGDA VKKGQVLFED 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 22 . pep KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEXNDEIEFERYAPEALANLSGEEVRR 

II I I I I I I I I = I I M I I I I I II I I I I M I I I I I I II I I I I I II II I I I I I I II I 

o r f 2 2 a KKXPGWFTAPVSGKI AAI HRGEKRVLQS WI AVEGNDE IE FERYAPEALANLSGXEXXX 

70 80 90 100 110 120 

130 140 150 

orf 22 .pep NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNP 
I I I I I I I M M I : I I M I I I I I I I I I I U I I M | [ [ [ [ 
o r f 2 2 a NL I QS GLWTALRXRP FSKI PAVDAE PFAI FVNAMDTNPLAAD PWV I KEAXXD FRRXXLV 

130 140 150 160 170 180 

The complete strain B sequence (ORF22-1) and ORF22a show 94.9% identity in 447 aa overlap: 
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orf22a.p> 
orf22-l 



MIKIKKGLNLPIAGRPEQVIYDGPVITEVALLGEEYAGMRPXMKVKEGDAVKKGQVLFED 
I I I I I I I I I I I I I II I I I I I I I : I I I I I I I I I I I I I I I I I I I It I I I 1 II II I 1 I I I 
MIKIKKGLNLPIAGRPEQAVYDGPAITEVALLGEEYAGMRPSMKVKEGDAVKKGQVLFED 



10 



20 



30 



40 



50 



60 



70 80 90 100 110 120 

KKXPGVVFTAPVSGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYAPEALANLSGXEXXX 
|| | | | 1 M I I : M I M I I I II M I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I 
KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYAPEALANLSGEEVRR 

70 80 90 100 110 120 

130 140 150 160 170 180 

NLIQSGLWTALRXRPFSKIPAVDAEPFAIFVNAMDTNPLAADPVWIKEAXXDFRRXXLV 
|| II II I I I I I I : I I I I I I I I I I I M I 1 M II I I I I I I I I I I I = I = I I I I I I : I II 
NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNPLAADPTVIIKEAAEDFKRGLLV 
130 140 150 160 170 180 

190 200 210 220 230 240 

LSRLTERKIHVCKAAGADVPSENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTVWTI 

I | I I I I || II I I I I I I I I I I I I I I I II I I II 1 11 I I I I I I I I I I II I I I I I I I I M I I II 
LSRLTERKIHVCKAAGADVPSENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTVWTI 

190 200 210 220 230 240 

250 260 270 280 290 300 

NYQDVIAIGRLFATGRLNTERVIALGGSQVNKPRLLRTVLGAKVSQITAGE LVDADNRVI 

II I I I I : II I I I I I I I I I I I I I I I I M I I I I I I I II I I I! I I I I I I M M I I II : I I I I I 
NYQDVITIGRLFATGRLNTERVIALGGSQVNKPRLLRTVLGAKVSQITAGELVDTDNRVI 

250 260 270 280 290 300 

310 320 330 340 350 360 

SGSVLNGAITQGAHDYLGRYHNQISVIEEGRSKELFGWAPQPDKYSITRTTLGHFLKNK 

I I I I I I I M I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I II I I I I I I II M I I 

SGSVLNGAITQGAHDYLGRYHNQISVIEEGRSKELFGWVAPQPDKYSITRTTLGHFLKNK 

310 320 330 340 350 360 

370 380 390 400 410 420 

LFKFTTAVNGGDRAMVPIGTYERVMPLDILPTLLLRDLIVGDTDSAQALGCLELDEEDLA 
I I I I : i I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 1 I I I I I I I I I I I II I 
LFKFNTAVNGGDRAMVPIGTYERVMPLDILPTLLLRDLIVGDTDSAQALGCLELDEEDLA 

370 380 390 400 410 420 



Further work identified a partial gene sequence <SEQ ID 129> from N. gonorrhoeae, which 
encodes the following amino acid sequence <SEQ ID 130; ORF22ng>: 

1 MIKIKKGLNL PIAGRPEQVI YDGPAITEVA LLGEEYVGMR PSMKIKEGEA 

51 VKKGQVLFED KKNPGWFTA PASGKIAAIH RGEKRVLQSV V I AVE GN DE I 

101 EFERYVPEAL AKLSSEKVRR NLIQSGLWTA LRTRPFSKIP AVDAEPFAIF 

151 VNAMDTNPLA ADPTVIIKEA AEDFKRGLLV LSRLTERKIH VCKAAG AD V P 

201 SENAANIETH EFGGPHPAGL SGTHIHFIEP VGANKTVWTI NYQDVIAIGR 

251 LFVTGRLNTE RVVALGGLQV NKPRLLRTVL GAKVSQLTAG ELVDADNRVI 

301 SGSVLNGAIA QGAHDYLGRY HN* 



Further work identified complete gonococcal gene <SEQ ID 13 1>: 



101 
151 
201 
251 
301 



ATGATTAAAA 
GCAAGTCATT 
AAGAATATGT 
GTCAAAAAAG 
ATTTACTGCG 
AGCGCGTACT 
GAGTTCGAAC 



TCAAAAAAGG 
TATGACGGCC 
CGGCATGCGC 
GCCAAGTGCT 
CCGGCTTCAG 
TCAGTCAGTC 
GCTACGTACC 



TCTAAATCTG 
CGGCCATTAC 
CCCTCGATGA 
GTTTGAAGAC 
GCAAAATCGC 
GTGATTGCCG 
TGAAGCGCTG 



CCCATCGCGG 
CGAAGTCGCG 
AAAT CAAGGA 
AAAAAGAATC 
CGCTATTCAC 
TTGAAGGCAA 
GCAAAATTGA 



GCAGACCGGA 
TTGCTTGGCG 
AGGTGAAGCC 
CGGGCGTAGT 
CGTGGCGAAA 
CGACGAAATC 
GCAGCGAAAA 
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351 AGTGCGCCGC AACCTGATTC AATCAGGCTT ATGGACTGCG CTTCGCACCC 

4 01 GTCCGTTCAG CAAAATCCCT GCCGTAGATG CCGAGCCGTT CGCCATCTTC 

451 GTCAATGCGA TGGACACCAA TCCGCTGGCT GCCGACCCTA CGGTCATCAT 

501 CAAAGAAGCC GCCGAAGACT TCAAACGCGG CCTGTTGGTA TTGAGCCGCC 

551 TGACCGAACG TAAAATCCAT GTGTGTAAAG CAGCAGGCGC AGACGTGCCG 

601 TCTGAAAATG CTGCCAATAT CGAAACACAT GAATTTGGCG GCCCGCATCC 

651 TGCCGGCTTG AGTGGCACGC ACAT T CAT T T CATCGAGCCA GTCGGCGCGA 

701 ATAAAACCGT GTGGACCATC AATTATCAAG ACGTGATTGC TATCGGACGT 

751 TTGTTCGTAA CAGGCCGTCT GAATACCGAG CGCGTGGTTG CCTTGGGCGG 

801 CCTGCAAGTC AACAAACCGC GCCTCTTGCG TACCGTTTTG GGTGCGAAGG 

851 TGTCTCAACT TACCGCCGGC GAATTGGTTG ACGCGGACAA CCGCGTGATT 

901 TCCGGTTCGG TATTGAACGG TGCGATTGCA CAAGGCGCGC ATGATTATTT 

951 GGGACGCTAC CACAATCAGA TTTCCGTTAT CGAAGAAGGC CGCAGCAAAG 

1001 AGCTGTTCGG CTGGGTTGCG CCGCAGCCGG ACAAATACTC CATCACGCGC 

1051 ACCACTCTCG GCCATTTCCT AAAAAACAAA CTCTTCAAGT TCACGACAGC 

1101 CGTCAACGGC GGCGACCGCG CCATGGTACC GATCGGCACT TATGAGCGCG 

1151 TAATGCCGTT GGACATCCTG CCTACCTTGC TTTTGCGCGA TTTAATCGTC 

1201 GGCGATACCG ACAGCGCGCA GGCTTTGGGT TGCTTGGAAT TGGACGAAGA 

1251 AGACCTCGCT TTGTGCAGCT TCGTCTGCCC GGGCAAATAC GAATACGGCC 

1301 CGCTGTTGCG CAAAGTGCTG GAAACCATTG AGAAGGAAGG CTGA 

This encodes a protein having amino acid sequence <SEQ ID 132; ORF22ng-l>: 



1 MIKIKKGLNL PIAGRPEQVI YDGPAITEVA LLGEEYVGMR PSMKIKEGEA 

51 VKKGQVLFED KKNPGVVFTA PASGKIAAIH RGEKRVLQSV VIAVEGNDEI 

101 EFERYVPEAL AKLSSEKVRR NLIQSGLWTA LRTRPFSKIP AVDAEPFAIF 

151 VNAMDTN PLA ADPTVIIKEA AEDFKRGLLV LSRLTERKIH VCKAAGADVP 

201 SENAANIETH EFGGPHPAGL SGTHIHFIEP VGANKTVWTI NYQDVIAIGR 

251 LFVTGRLNTE RWALGGLQV NKPRLLRTVL GAKVSQLTAG ELVDADNRVI 

301 SGSVLNGAIA QGAHDYLGRY HNQISVIEEG RSKELFGWVA PQPDKYSITR 

351 TTLGHFLKNK LFKFTTAVNG GDRAMVPIGT YERVMPLDIL PTLLLRDLIV 

401 GDTDSAQALG CLELDEEDLA LCSFVCPGKY EYGPLLRKVL ETIEKEG* 



The originally-identified partial strain B sequence (ORF22) shows 93.7% identity over a 158aa 
overlap with ORF22ng: 



or f 22 .pep MIKIKKGLNLPIAGRPEQAVYDGPAITEVALLGEEYAGMRPSMKVKEGDAVKKGQVLFED 60 

t I I I I I I I I I I I I I I I I I I I I I I I I I [ S I I N I I : I I II I I I : I I I : M I I I I M M I 
orf22ng MIKIKKGLNLPIAGRPEQVIYDGPAITEVALLGEEYVGMRPSMKIKEGEAVKKGQVLFED 60 

orf 22 .pep KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEXNDEIEFERYAPEALANLSGEEVRR 120 

I I I I 1 I I I I I I I I I I I II I I I I II I I I I I 1 I I I 1 I I II I I I I I I : I I I I I : I I : I : I I I 
orf22ng KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYVPEALAKLSSEKVRR 120 

orf 22. pep NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNP 158 

I I 1 I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I 
orf22ng NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNPLAADPTVIIKEAAEDFKRGLLV 180 

The complete sequences from strain B (ORF22-1) and gonococcus (ORF22ng) show 96.2% 
identity in 447 aa overlap: 



MIKIKKGLNLPIAGRPEQAVYDGPAITEVALLGEEYAGMRPSMKVKEGDAVKKGQVLFED 
MIKIKKGLNLPIAGRPEQVIYDGPAITEVALLGEEYVGMRPSMKIKEGEAVKKGQVLFED 



70 80 90 100 110 120 

KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYAPEALANLSGEEVRR 
I > I I I I I I I I I I M I I I I II I I I II I I I I I I I I I I I I I I I I | | | | : | | | II : I I : I : I I I 
KKNPGVVFTAPASGK1AAIHRGEKRVLQSWIAVEGNDEIEFERYVPEALAKLSSEKVRR 

70 80 90 100 110 120 

130 140 150 160 170 180 

NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNPLAADPTVIIKEAAEDFKRGLLV 
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I i i | | M | | | | | M I M I I I II II I I M M I I I I M I I I I I I 1 M 1 1 1 1 1 I I M I II i I I 

NLIQSGLWTALRTRPFSKIPAVDAEPFAIFVNAMDTNPLAADPTVIIKEAAEDFKRGLLV 
130 140 150 160 170 180 



150 



orf 22-1. pep 



190 200 210 220 230 240 

LSRLTERKIHVCKAAGADVPSENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTVWTI 
M | | | | | | | I | || || I I I I II 1 I I II I I I II I I I I I I I I I I I I I I I I I I I I I I I M I I I I 
orf22na-l LSRLTERKIHVCKAAGADVPSENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTVWTI 
9 190 200 210 220 230 240 

10 250 260 270 280 290 300 

orf 22-1 Pep NYQDVITIGRLFATGRLNTERVIALGGSQVNKPRLLRTVLGAKVSQITAGELVDTDNRVI 

| || II I: II II I: INI III II: I II I I I I I I M M I I I I I I I I I : I M II M : I 1 I M 

nrf?2na-l NYQDVIAIGRLFVTGRLNTERWALGGLQVNKPRLLRTVLGAKVSQLTAGELVDADNRVI 
15 9 250 260 270 280 290 300 

310 320 330 340 350 360 

orf 22-1 pep SGSVLNGAITQGAHDYLGRYHNQISVIEEGRSKELFGWVAPQPDKYSITRTTLGHFLKNK 
| | M I I I I I : I M I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II M I I 
20 orf22ng-l SGSVLNGAIAQGAHDYLGRYHNQISVIEEGRSKELFGWVAPQPDKYSITRTTLGHFLKNK 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 22-l pep LFKFNTAVNGGDRAMVPIGTYERVMPLDILPTLLLRDLIVGDTDSAQALGCLELDEEDLA 

25 * I I I I : I I I I I I I I I I I I I II I I I I I I I 1 M I I I I I I I I I I I I I I I I I I I I I I I I I 

orf22ng-l LFKFTTAVNGGDRAMVPIGTYERVMPLDILPTLLLRDLIVGDTDSAQALGCLELDEEDLA 
370 380 390 400 410 420 

430 440 

30 orf22-l.pep LCS FVCPGKYEYGPLLRKVLET IEKEGX 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf22ng-l LCS FVCPGKYEYGPLLRKVLET IEKEGX 

430 440 

Computer analysis of these sequences gave the following results: 
35 Homology with 48kDa outer membrane protein of Actinobacillus pleuropneumoniae (accession number U24492). 
ORF22 and this 48kDa protein show 72% aa identity in 158aa overlap: 

Orf22 1 MIKIKKGLNLPIAGRPEQAVYDGPAITEVALLGEEYAGMRPSMKVKEGDAVKKGQVLFED 60 

MI IKKGL+LPIAG P Q +++G + EVA+LGEEY GMRPSMKV+EGD VKKGQVLFED 
4 8kDa 1 MITIKKGLDLPIAGTPAQVIHNGNTVNEVAMLGEEYVGMRPSMKVREGDWKKGQVLFED 60 

40 

orf 22 61 KKNPGWFTAPASGKIAAIHRGEKRVLQSWIAVEXNDEIEFERYAPEALANLSGEEVRR 120 

KKNPGVVFTAPASG + I+RGEKRVLQSWI VE +++I F RY LA+LS E+V++ 
4 8kDa 61 KKNPGWFTAPASGTVVTINRGEKRVLQSWIKVEGDEQITFTRYEAAQLASLSAEQVKQ 12 0 

45 orf22 121 NL I QSGLWTALRTRP FS KI PAVDAE P FAI FVNAMDTN P 158 

NLI+SGLWTA RTRPFSK+PA+DA P +IFVNAMDTNP 
48kDa 121 NLIESGLWTAFRTRPFSKVPALDAIPSSIFVNAMDTNP 158 



ORF22a also shows homology to the 48kDa Actinobacillus pleuropneumoniae protein: 

gi 1 1185395 (U24492) 48 kDa outer membrane protein [Actinobacillus pleuropneumoniae] 
Length =44 9 

Score = 530 bits (1351), Expect = e-150 

Identities = 274/450 (60%), Positives = 323/450 (70%), Gaps = 4/450 (0%) 

Query: 1 MIKIKKGLNLPIAGRPEQVIYDGPVITEVALLGEEYAGMRPXMKVKEGDAVKKGQVLFED 60 

MI IKKGL+LPIAG P QVI++G + EVA+LGEEY GMRP MKV+EGD VKKGQVLFED 
Sbjct: 1 MITIKKGLDLPIAGTPAQVIHNGNTVNEVAMLGEEYVGMRPSMKVREGDWKKGQVLFED 60 

Query: 61 KKXPGWFTAPVSGKIAAIHRGEKRVLQSWIAVEGNDEIEFERYAPEALANLSGXEXXX 120 

KK PGVVFTAP SG + I+RGEKRVLQSWI VEG+++I F RY LA+LS + 
Sbjct: 61 KKNPGWFTAPASGTVVTINRGEKRVLQSWIKVEGDEQITFTRYEAAQLASLSAEQVKQ 120 
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Query: 


121 


Sbjct: 


121 


Query: 


181 


Sbjct: 


181 


Query: 


238 


Sbjct: 


241 




298 


Sbjct: 


301 




358 


Sbjct: 


361 


Query: 


418 



NLI+SGLWTA R RPFSK+PA+DA P +IFVNAMDTNPLAADP W+KE 



LSRL — TERKIHVCKAAGADVP-SENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTV 237 
L+RL ++ +++CK A +++P S I F G HPAGL GTHIHF++PVGA K V 

LTRLFNGQKPVYLCKDADSNIPLSPAIEGITIKSFSGVHPAGLVGTHIHFVDPVGATKQV 240 



W +NYQDVIAIG+LF TG L T+R+I+L G QV PRL+RT LGA +SQ+TA EL +N 
WHLNYQDVIAIGKLFTTGELFTDRIISLAGPQVKNPRLVRTRLGANLSQLTANELNAGEN 300 



K KLF FTTAV+GG+RAMVPIG YERVM 



++VCPGK GP+LR LE EKEG 

ORF22ng-l also shows homology with the OMP from A.pleuropneumoniae: 

gi[ 1185395 (U24492) 48 kDa outer membrane protein [Actinobacillus 
pleuropneumoniae] Length = 449 

555 bits (1414), Expect = e-157 

3 = 284/450 (63%), Positives = 337/450 (74%), Gaps = 4/450 (0%) 

MIKIKKGLNLPIAGRPEQVIYDGPAITEVALLGEEYVGMRPSMKIKEGEAVKKGQVLFED 8 6 
MI IKKGL+LPIAG P QV1++G + EVA+LGEEYVGMRPSMK++EG+ VKKGQVLFED 
MITIKKGLDLPIAGTPAQVIHNGNTVNEVAMLGEEYVGMRPSMKVREGDVVKKGQVLFED 60 

KKNPGWFTAPASGKIAAIHRGEKRVLQSVVIAVEGNDEIEFERYVPEALAKLSSEKVRR 14 6 
KKNPGWFTAPASG + I+RGEKRVLQSWI VEG+++I F RY LA LS+E+V++ 
KKNPGVVFTAPASGTWTINRGEKRVLQSWrKVEGDEQITFTRYEAAQLASLSAEQVKQ 120 

NL I QSGLWTALRTRPFSKI PAVDAEP FAI FVNAMDTNPLAADPTVI IKEAAED FKRGLLV 2 0 6 
NLI+SGLWTA RTRPFSK+PA+DA P +IFVNAMDTNPLAADP V++KE DFK GL V 
NLIES GLWTAFRTRPFSKVPALDAI PS S I FVNAMDTNPLAADPEWLKEYETDFKDGLT V 180 

LSRL— TERKIHVCKAAGADVP-SENAANIETHEFGGPHPAGLSGTHIHFIEPVGANKTV 2 63 
L+RL ++ +++CK A +++P S I F G HPAGL GTHIHF++PVGA K V 



Score 




Ident 




Query: 


27 


Sbjct: 


1 




87 


Sbjct: 


61 


Query: 


147 


Sbjct: 


121 


Query: 


207 


Sbjct : 


181 




264 


Sbjct: 


241 


Query: 


324 


Sbjct: 


301 




384 


Sbjct: 


361 




444 


Sbjct: 


420 



W +NYQDVIAIG+LF TG L T+R+++L G QV PRL+RT LGA +SQLTA EL +N 
WHLNYQDVIAIGKLFTTGELFTDRIISLAGPQVKNPRLVRTRLGANLSQLTANELNAGEN 3 00 

RVISGSVLNGAIAQGAHDYLGRYHNQISVIEEGRSKELFGWVAPQPDKYSITRTTLGHFL 3 83 
RVISGSVL+GA A G DYLGRY Q+SV+ EGR KELFGW+ P DK+SITRT LGHF 
RVI SG S VL S GATAAGPVD YLGR YALQVS VLAEGREKEL FGWIMPGS DKFS I TRTVLGH FG 3 60 

KNKLFKFTTAVNGGDRAMVPIGTYERVMXXXXXXXXXXXXXXVGDTDSAQXXXXXXXXXX 4 43 
K KLF FTTAV+GG+RAMVPIG YERVM GDTDSAQ 

K-KLFNFTTAVHGGERAMVPIGAYERVMPLDIIPTLLLRDLAAGDTDSAQNLGCLELDEE 419 

XXXXXSFVCPGKYEYGPLLRKVLETIEKEG 4 73 

++VCPGK YGP+LR LE IEKEG 
DLALCTYVCPGKNNYGPMLRAALEKIEKEG 449 

Based on this analysis, including the homology with the outer membrane protein of Actinobacillus 
pleuropneumoniae, it was predicted that these proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 
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ORF22-1 (35.4kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
5 A shows the results of affinity purification of the GST-fusion protein, and Figure 5B shows the 
results of expression of the His-fusion in E.coli. Purified GST-fusion protein was used to immunise 
mice, whose sera were used for ELISA (positive result) and FACS analysis (Figure 5C). These 
experiments confirm that ORF22-1 is a surface-exposed protein, and that it is a useful immunogen. 

Example 16 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 133>: 

1 . . GCGnCGnAAA TCATCCATCC CC . . nACGTC GTAGGCCCTG AAGCCAACTG 

51 GTTTTTTATG GTAGCCAGTA CGTTTGTGAT TGCTTTGATT GGTTATTTTG 

101 T TACT GAAAA AATCGTCGAA CCGCAATTGG GCCCTTATCA ATCAGATTTG 

151 TCACAAGAAG AAAAAGACAT TCGGCATTCC AATGAAATCA CGCCTTTGGA 

201 ATATAAAGGA TTAATTTGGG CTGGCGTGGT GTTTGTTGCC TTATCCGCCC 

251 TATTGGCTTG GAGCATCGTC CCTGCCGACG GTATTTTGCG TCATCCTGAA 

301 ACAGGATTGG TTTCCGGTTC GCCGTTTTTA AAATCGATTG TTGTTTTTAT 

351 TTTCTTGTTG TTTGCACTGC CGGGCATTGT TTATGGCCGG GTAACCCGAA 

4 01 GTTTGCGCGG CGAACAGGAA GTCGTTAATG CGmyGGCCGA ATCGATGAGT 

451 ACTCTGGsGC TTTmTTTGsw CAkcATCTTT TTTGCCGCAC AGTTTGTCGC 

501 ATTTTTTAAT TGGACGAATA TTGGGCAATA TATTGCCGTT AAAGGGGCGA 

551 CGTTCTTAAA AGAAGT CGGC TTGGGCGGCA GCGTGTTGTT TATCGGTTTT 

601 ATTTTAATTT GTGCTTTTAT CAATCTGATG ATAGGCTCCG CCTCCGCGCA 

651 ATGGGCGGTA ACTGCGCCGA TTTTCGTCCC TATGCTGATG TTGGCCGGCT 

701 ACGCGCCCGA AG T CAT T C AA GCCGCTTACC GCATCGGTGA TTCCGTTACC 

751 AATATTATTA CGCCGATGAT GAGTTATTTC GGGCTGATTA TGGCGACGGT 

801 GrkCmmmTAC AAAAAAGATG CGGGCGTGGG TaCGcTGATT wCTATGATGT 

851 TGCCGTATTC CGCTTTCTTC TTGATTGCgT GGATTGCCTT ATTCTGCATT 

901 TGGGTATTTg TTTTGGGCCT GCCCGTCGGT CCCGGCGCGC CCACATTCTA 

951 TCCCGCACCT TAA 

This corresponds to the amino acid sequence <SEQ ID 134; ORF12>: 

1 . .AXXIIHPXXV VGPEANWFFM VASTFVIALI GYFVTEKIVE PQLGPYQSDL 

51 SQEEKDIRHS NEITPLEYKG L I WAGWFVA LSALLAWSIV PADGILRHPE 

101 TGLVSGSPFL KSIVVFIFLL FALPGIVYGR VTRSLRGEQE WNAXAESMS 

151 TLXLXLXXIF FAAQFVAFFN WTNIGQYIAV KGATFLKEVG LGGSVLFIGF 

201 ILICAFINLM IGSASAQWAV TAPI FVPMLM LAGYAPEVIQ AAYRIGDSVT 

251 NIITPMMSYF GLIMATVXXY KKDAGVGTLI XMMLPYSAFF LIAWIALFCI 

3 01 WVFVLGLPVG PGAPTFYPAP * 

Further sequence analysis revealed the complete DNA sequence <SEQ ID 135> to be: 



1 AT GAGT C AAA CCGATACGCA ACGGGACGGA CGATTTTTAC GCACAGTCGA 

51 ATGGCTGGGC AATATGTTGC CGCATCCGGT TACGCTTTTT ATTATTTTCA 

101 TTGTGTTATT GCTGATTGCC TCTGCCGTCG GTGCGTATTT CGGACTATCC 

151 GTCCCCGATC CGCGCCCTGT TGGTGCGAAA GGACGTGCCG ATGACGGTTT 

2 01 GATTTACATT GTCAGCCTGC TCAATGCCGA CGGTTTTATC AAAATCCTGA 

2 51 CGCATACCGT TAAAAATTTC ACCGGTTTCG CGCCGTTGGG AACGGTGTTG 

301 GTTTCTTTAT TGGGCGTGGG GATTGCGGAA AAATCGGGCT TGATTTCCGC 

351 ATTAATGCGC TTATTGCTCA CAAAATCGCC ACGCAAACTC ACTACTTTTA 

4 01 TGGTTGTTTT TACAGGGATT TTATCTAATA CCGCTTCTGA ATTGGGCTAT 

451 GTCGTCCTAA TCCCTTTGTC CGCCATCATC TTTCATTCCC TCGGCCGCCA 

501 TCCGCTTGCC GGTCTGGCTG CGGCTTTCGC CGGCGTTTCG GGCGGTTATT 

551 CGGCCAATCT GTTCTTAGGC ACAATCGATC CGCTCTTGGC AGGCATCACC 

6 01 CAACAGGCGG CGCAAAT CAT CCATCCCGAC TACGTCGTAG GCCCTGAAGC 

651 CAACTGGTTT TTTATGGTAG CCAGTACGTT TGTGATTGCT TTGATTGGTT 

7 01 ATTTTGTTAC TGAAAAAATC GTCGAACCGC AATTGGGCCC TTATCAATCA 
751 GATTTGTCAC AAGAAGAAAA AGACATTCGG CATTCCAATG AAATCACGCC 
801 TTTGGAATAT AAAGGATTAA TTTGGGCTGG CGTGGTGTTT GTTGCCTTAT 
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851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 



CCGCCCTATT 
CCTGAAACAG 
TTTTATTTTC 
CCCGAAGTTT 
ATGAGTACTC 
TGTCGCATTT 
GGGCGACGTT 
GGTTTTATTT 
CGCGCAATGG 
CCGGCTACGC 
GTTACCAATA 
GACGGTGATC 
TGATGTTGCC 
TGCATTTGGG 
ATTCTATCCC 



GGCTTGGAGC 
GATTGGTTTC 
TTGTTGTTTG 
GCGCGGCGAA 
TGGGGCTTTA 
TTTAATTGGA 
CTTAAAAGAA 
TAATTTGTGC 
GCGGTAACTG 
GCCCGAAGTC 
TTATTACGCC 
AAATACAAAA 
GTATTCCGCT 
TATTTGTTTT 
GCACCTTAA 



ATCGTCCCTG 
CGGTTCGCCG 
CACTGCCGGG 
CAGGAAGTCG 
TTTGGT CATC 
CGAATATTGG 
GTCGGCTTGG 
TTTTATCAAT 
CGCCGATTTT 
ATTCAAGCCG 
GATGAT GAGT 
AAGATGCGGG 
TTCTTCTTGA 
GGGCCTGCCC 



CCGACGGTAT 
TTTTTAAAAT 
CATTGTTTAT 
TTAATGCGAT 
ATCTTTTTTG 
GCAATATATT 
GCGGCAGCGT 
CTGATGATAG 
CGTCCCTATG 
CTTACCGCAT 
TATTTCGGGC 
CGTGGGTACG 
TTGCGTGGAT 
GTCGGTCCCG 



TTTGCGTCAT 
CGATTGTTGT 
GGCCGGGTAA 
GGCCGAATCG 
CCGCACAGTT 
GCCGTTAAAG 
GTTGTTTATC 
GCTCCGCCTC 
CTGATGTTGG 
CGGTGATTCC 
TGATTATGGC 
CTGATTTCTA 
TGCCTTATTC 
GCGCGCCCAC 



This corresponds to the amino acid sequence <SEQ ID 136; ORF12-l>: 



101 
151 
201 
251 
301 
351 
401 
451 
501 



MSQTDTQRDG RFLRTVEWLG 
VPDPRPVGAK GRADDG LIYI 
VSLLGVG1A E KSGLISALMR 
VVLIPLSAII FHSL GRHPLA 
QQAAQIIHPD Y VVG PE ANW F_ 
DLSQEEKDIR HSNEITPLEY 
PETGLVSGSP FLKS IVVFIF 



NML PHP VTLF IIFIVLLLIA 

VSLLNADGFI KIL THTVKNF 

LLLTKSPRKL TTFMWFTGI 

GLAAAFAGVS GGYSANLFLG 

FMVASTFVIA LIGYFV TEKI 

KGLIW AGWF VALSALLAWS 

LLFALPGIVY GRVTRSLRGE 



MST LGLYLVI I FFAAQFVAF 

GFILICAFIN LMI GSASAQW 

VTN IITPMMS YFGLIMATVI 

CIWVFVLGLP VGPGAPTFYP 



FNWTNIGQYI AVKGAT FLKE 
AVTAPIFVPM LMLAGYA PEV 
KYKKDAGVGT LISMMLPYSA 



SAV GAYFGLS 
TGFAPLGTVL 
LSNTASE LGY 
TIDPLLAGIT 
VEPQLGPYQS 
IVPADGILRH 
QEWNAMAES 
VGLGGS VLFI 
IQAAYRIGDS 
FFLIAWIALF 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 
30 ORF12 shows 96.3% identity over a 320aa overlap with an ORF (ORF12a) from strain A of 7V". 
meningitidis: 

10 20 30 

orf 12 .pep AXXIIHPXXVVGPEANWFFMVASTFVIALI 

35 orfl2a AAAFAGVSGGYSANLFLGTIDPLLAGITQQAAQIIHPDYWGPEANWFFMVASTFVIALI 

180 190 200 210 220 230 

40 50 60 70 80 90 

orf 12 . pep GYFVTEKIVEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIV 



100 110 120 130 140 150 

PADGILRHPETGLVSGSPFLKSIWFIFLLFALPGIVYGRVTRSLRGEQEWNAXAESMS 

I I I I I I I 1 I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I 

PADGILRHPETGLVSGSPFLKSIWFIFLLFALPGIVYGRVTRSLRGEQEVWAMAESMS 

300 310 320 330 340 350 

160 170 180 190 200 210 

TLXLXLXXIFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLM 

II I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
TLGLYLVIIFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLM 

360 370 380 390 400 410 

220 230 240 250 260 270 

IGSASAQWAVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVXXY 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
I G S AS AQWAVT AP I FVPMLMLAG YAPEVI QAAYRI GDSVTNIIT PMMS Y FGLIMATV I KY 
420 430 440 450 460 470 



280 



290 



300 



310 



320 
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orfl2 pep KKDAGVGTLIXMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAPX 
| | | | | I I I M I I M M I! I i I I I I M I I I I I M 1 M I I I M I I I I I I I I 1 

orfl2a KKDAGVGTLISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAPX 
480 490 500 510 520 

The complete length ORF12a nucleotide sequence <SEQ ID 137> is: 

1 ATGAGTCAAA CCGATACGCA ACGGGACGGA CGATTTTTAC GCACAGTCGA 

51 ATGGCTGGGC AATATGTTGC CGCACCCGGT TACGCTTTTT ATTATTTTCA 

101 T TGTGTT ATT GCTGATTGCC TCTGCCGCCG GTGCGTATTT CGGACTATCC 

151 GTCCCCGATC CGCGCCCTGT TGGTGCGAAA GGACGTGCCG ATGACGGTTT 

201 GATTCACGTT GTCAGCCTGC TCGATGCTGA CGGTTTGATC AAAATCCTGA 

251 CGCATACCGT TAAAAATTTC ACCGGTTTCG CGCCGTTGGG AACGGTGTTG 

301 GTTTCTTTAT TGGGCGTGGG GATTGCGGAA AAATCGGGCT TGATTTCCGC 

351 ATTAATGCGC TTATTGCTCA CAAAATCTCC ACGCAAACTC ACTACTTTTA 

401 TGGTTGTTTT TACAGGGATT TTATCTAATA CCGCTTCTGA ATTGGGCTAT 

451 GTCGTCCTAA TCCCTTTGTC CGCCATCATC TTTCATTCCC TCGGCCGCCA 

501 TCCGCTTGCC GGTCTGGCTG CGGCTTTCGC CGGCGTTTCG GGCGGTTATT 

551 CGGCCAATCT GTTCTTAGGC ACAATCGATC CGCTCTTGGC AGGCATCACC 

601 CAACAGGCGG CGCAAATCAT CCATCCCGAC TACGTCGTAG GCCCTGAAGC 

651 CAACTGGTTT TTTATGGTAG CCAGTACGTT TGTGATTGCT TTGATTGGTT 

7 01 ATTTTGTTAC TGAAAAAATC GTCGAACCGC AATTGGGCCC TTATCAATCA 

751 GATTTGT CAC AAGAAGAAAA AGACATTCGA CATTCCAATG AAATCACGCC 

801 TTTGGAATAT AAAGGATTAA TTTGGGCTGG CGTGGTGTTT GTTGCCTTAT 

851 CCGCCCTATT GGCTTGGAGC ATCGTCCCTG CCGACGGTAT TTTGCGTCAT 

901 CCTGAAACAG GATTGGTTTC CGGTTCGCCG TTTTTAAAAT CAATTGTTGT 

951 TTTTATTTTC TTGTTGTTTG CACTGCCGGG CATTGTTTAT GGCCGGGTAA 

1001 CCCGAAGTTT GCGCGGCGAA CAGGAAGTCG TTAATGCGAT GGCCGAATCG 

1051 AT GAGT ACT C TGGGGCTTTA TTTGGTCATC ATCTTTTTTG CCGCACAGTT 

1101 TGTCGCATTT TTTAATTGGA CGAATATTGG GCAATATATT GCCGTTAAAG 

1151 GGGCGACGTT CTTAAAAGAA GTCGGCTTGG GCGGCAGCGT GTTGTTTATC 

1201 GGTTTTATTT TAATTTGTGC TTTTATCAAT CTGATGATAG GCTCCGCCTC 

1251 CGCGCAATGG GCGGTAACTG CGCCGATTTT CGTCCCTATG CTGATGTTGG 

1301 CCGGCTACGC GCCCGAAGTC ATTCAAGCCG CTTACCGCAT CGGTGATTCC 

1351 GTTACCAATA TTATTACGCC GAT GAT GAGT TATTTCGGGC TGATTATGGC 

14 01 GACGGTGATC AAATACAAAA AAGATGCGGG CGTGGGTACG CTGATTTCTA 

14 51 TGATGTTGCC GTATTCCGCT TTCTTCTTGA TTGCGTGGAT TGCCTTATTC 

1501 TGCATTTGGG TATTTGTTTT GGGCCTGCCC GTCGGTCCCG GCGCGCCCAC 

1551 ATTCTATCCC GCACCTTAA 

This encodes a protein having amino acid sequence <SEQ ID 138>: 



1 MSQTDTQRDG RFLRTVEWLG NMLPHP VTLF IIFIVLLLIA SAA GAYFGLS 

51 VPDPRPVGAK GRADDG L I HV VSLLDADGLI KIL THTVKNF TGFAPLGTVL 

101 VSLLGVGIA E KSGLISALMR LLLTKSPRKL TTFMWFTGI LSNTASE LGY 

151 WLIPLSAII FHSL GRHPLA GLAAAFAGVS GGYSANLFLG TIDPLLAGIT 

201 QQAAQIIHPD YWGPEANW F FMVASTFVIA LIGYFV TEKI VEPQLGPYQS 

251 DLSQEEKDIR HSNEITPLEY KGLIW AGWF VALSALLAWS IV PADGILRH 

301 PETGLVSGSP FLKS IVVFIF LLFALPGIVY G RVTRSLRGE QEWNAMAES 

351 MST LGLYLVI IFFAAQFVAF FNWTNIGQYI AVKGATFLKE VGLGGS VLFI 

4 01 GFILICAFIN LMI GSASAQW AVTAPIFVPM LMLAGYA PEV IQAAYRIGDS 

4 51 VTN IITPMMS YFGLIMATVI KYKKDAGVGT LISMMLPYSA FFLIAWIALF 

501 CIWVFVLGLP VGPGAPTFYP AP* 



ORF12a and ORF12-1 show 99.0% identity in 522 aa overlap: 

10 20 30 40 50 60 

orf 12a . pep MSQTDTQRDGRFLRTVEWLGNMLPHPVTLFI IFIVLLLIASAAGAYFGLSVPDPRPVGAK 

I I I I I I I I I I I I I I I I I I I I I I M I I I 1 I I I I I I I I I I I I I I = I I I I I I I I I I I I I I I I I 
orf 12-1 MSQTDTQRDGRFLRTVEWLGNMLPHPVTLFIIFIVLLLIASAVGAYFGLSVPDPRPVGAK 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 12a . pep GRADDGL IHVVSLLDADGLIKILTHTVKNFTGFAPLGTVLVSLLGVGIAEKSGLISALMR 

I I I I I I I I : : I II I : I I I : I I I I I I I I 1 I I I I I [ I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 12-1 GRADDGLIYIVSLLNADGFIKILTHTVKNFTGFAPLGTVLVSLLGVGIAEKSGLISALMR 

70 80 90 100 110 120 
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130 140 150 160 170 180 

n-rf 1 2a Deo LLLTKSPRKLTTFMWFTGILSNTASELGYWLIPL3AIIFHSLGRHPLAGLAAAFAGVS 

IMIIIIIIIIIIIIIIMIIIIIIMMI I II II II II Ml 

nrf 1 2-1 LLLTKSPRKLTTFMWFTGILSNTASELGYWLIPLSAIIFHSLGRHPLAGLAAAFAGVS 
5 130 140 150 160 170 180 

190 200 210 220 230 240 

orfl2a pep GGYSANLFLGTIDPLLAGITQQAAQIIHPDYVVGPEANWFFMVASTFVIALIGYFVTEKI 
| M | | | | | | | | II I I I I I I I I I I M I I 1 I II I I I I M M I I i M II II II I 11 I I I II I I 
1 0 orf 12-1 GGYSANLFLGTIDPLLAGITQQAAQIIHPDYWGPEANWFFMVASTFVIALIGYFVTEKI 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 12a pep VEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIVPADGILRH 

15 ' | | I I I I I I I I I I II I I II I I I I II I I I I M I I I I I II I I I I I I I I I I I I I I I 

orf 12-1 VEPQLGPYQSDL3QEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIVPADGILRH 

250 260 270 280 290 300 

310 320 330 340 350 360 

20 orfl2a pep PETGLVSGSPFLKSIVVFIFLLFALPGIVYGRVTR3LRGEQEWNAMAESMSTLGLYLVI 

i II II I M M I I I I I I I I I I I I M II I M M I I I I I I 

orf 12-1 PETGLVSGSPFLKSIWFIFLLFALPGIVYGRVTRSLRGEQEWNAMAESMSTLGLYLVI 
310 320 330 340 350 360 

25 370 380 390 400 410 420 

orf 12a pep IFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLMIGSASAQW 
I I M M I I II M II I I I II I I I I I II I I I I I II I I I I I I I I I I I I M M M I I M M I I I 
orf 12-1 IFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLMIGSASAQW 
370 380 390 400 410 420 

30 

430 440 450 460 470 480 

orf 12a . pep AVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKYKKDAGVGT 
1 II II I I I M I I I I I I M I I I I I I I I I I I I I I I I I I I I I I M II I I I I I I I I I I I I I I I I 
orf 12-1 AVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKYKKDAGVGT 
35 430 440 450 460 470 480 

490 500 510 520 

orf 12a . pep LISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAPX 
I II I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
40 orfl2-l LISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAPX 

490 500 510 520 

Homology with a predicted ORF from N.gonorrhoeae 

ORF12 shows 92.5% identity over a 320aa overlap with a predicted ORF (ORF12.ng) from N. 
45 gonorrhoeae: 

orf 12. pep AXXIIHPXXWGPEANWFFMVASTFVIALI 30 

I I I I I I II I I I I I I I I : I I I I I I I I I 
orf 12ng AAAFAGVSGGYSANLFLGTIDPLLAGITQQAAQIIHPDYWGPEANWFFMAASTFVIALI 232 

50 orf 12. pep GYFVTEKIVEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIV 90 

I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

orf 12ng GYFVTEKIVEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIV 2 92 

orf 12 .pep PADGILRHPETGLVSGSPFLKSIWFIFLLFALPGIVYGRVTRSLRGEQEWNAXAESMS 150 

55 I I II I I II I I I I I I : I I II I I I I II I I I I I I I I I I I I I I I : I I I I I I I : I M II Mill 

orfl2ng PADGILRHPETGLVAGSPFLKSIWFIFLLFALPGIVYGRITRSLRGEREWNAMAESMS 352 

orf 12 .pep TLXLXLXXIFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLM 210 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I : I II : I I I I I i I I I I I I II I I I I I I 

60 orfl2ng TLGLYLVIIFFAAQFVAFFNWTNIGQYIAVKGAVFLKKFRLGGSVLFIGFILICAFINLM 412 

orf 12 .pep IGSASAQWAVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNI ITPMMSYFGLIMATVXXY 270 

II II I I I II I II I I I I I I I I I I I I I : I I I I I I I M I ! I I I I I I I M I II I I I I I I I I 
orfl2ng IGSASAQWAVTAPIFVPMLMLAGNAPQVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKY 472 
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orfl2 pep KKDAGVGTLIXMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGAPTFYPAP 320 

I | I I I I I I I I I I I I I I I I I I I M I 1 I I I I I I 1 I I I I 1 1 I M : M I I I : I 
orfl2ng KKDAGVGTLISMMLPYSAFFLIAWIALFCIWVFVLGLPVGPGTPTFYPVP 522 

The complete length ORF12ng nucleotide sequence <SEQ ID 13 9> is: 

1 ATGAGTCAAA CCGACGCGCG TCGTAGCGGA CGATTTTTAC GCACAGT CGA 

51 ATGGCTGGGC AATATGTTGC CGCACCCGGT TACGCTTTTT ATTATTTTCA 

101 TTGTGTTATT GCTGATTGcc tctgCCGTCG GTGCGTATTT CGGACTATCC 

151 GTCCCCGATC CGCGTCCTGT TGGGGCGAAA GGACGTGCCG ATGACGGTTT 

201 GATTCACGTT GTCAGCCTGC TCGATGCCGA CGGTTTGATC AAAATCCTGA 

251 CGCATACCGT TAAAAATTTC ACCGGTTTCG CGCCGTTGGG AACGGTGTTG 

301 GTTTCTTTAT TGGGCGTGGG GATTGCGGAA AAAT CGGGCT TGATTTCCGC 

351 ATTAATGCGC TTATTGCTCA CAAAATCCCC ACGCAAACTC ACTACTTTTA 

401 TGGTTGTTTT TACAGGGATT TTATCCAATA CGGCTTCTGA ATTGGGCTAT 

451 GTCGTCCTAA TCCCTTTGTC CGCCGTCATC TTTCATTCGC TCGGCCGCCA 

501 TCCGCTTGCC GGTTTGGCTG CGGCTTTCGC CGGCGTTTCG GGCGGTTATT 

551 CGGCCAATCT GTTCTTAGGC ACAATCGATC CGCTCTTGGC AGGCATCACC 

601 CAACAGGCGG CGCAAATCAT CCATCCCGAC TACGTCGTAG GCCCT GAAGC 

651 CAACTGGTTT TTTATGGCAG CCAGTACGTT TGTGATTGCT TTGATTGGTT 

7 01 ATTTTGTTAC TGAAAAAATC GTCGAACCGC AATTGGGCCC TTATCAATCA 

751 GATTTGTCAC AAGAAGAAAA AGACATTCGG CATTCCAATG AAATCACGCC 

801 TTTGGAATAT AAAGGATTAA TTTGGGCAGG CGTGGTGTTT GTTGCCTTAT 

851 CCGCCCTATT GGCTTGGAGC ATCGTCCCTG CCGACGGTAT TTTGCGTCAT 

901 CCTGAAACAG GATTGGTTGC CGGTTCGCCG TTTTTAAAAT CGATTGTTGT 

951 TTTTATTTTC TTGTTGTTTG CGCTGCCGGG CATTGTTTAT GGCCGGATAA 

10 01 CCCGAAGTTT GCGCGGCGAA CGGGAAGTCG TTAATGCGAT GGCCGAATCG 

1051 ATGAGTACTT TGGGACTTTA TTTGGTCATC ATCTTTTTTG CCGCACAGTT 

1101 TGTCGCATTT TTTAATTGGA CGAATATTGG GCAATATATT GCCGTTAAAG 

1151 GGGCGGTGTT CTTAAAAGAA GTCGGCTTGG GCGGCAGTGT GTTGTTTATC 

12 01 GGTTTTATTT TAATTTGTGC TTTTATCAAT CTGATGATAG GCTCCGCCTC 
1251 CGCGCAATGG GCGGTAACTG CGCCGATTTT CGTCCCTATG CTGATGTTGG 

13 01 CCGGCTACGC GCCCGAAGTC ATTCAAGCCG CTTACCGCAT CGGTGATTCC 
1351 GTTACCAATA TTATTACGCC GATGATGAGT TATTTCGGGC TGATTATGGC 

14 01 GACGGTAATC AAATACAAAA AAGATGCGGG CGTAGGCACG CTGATTTCTA 
14 51 TGATGTTGCC GTATTCCGCT TTCTTCTTAA TTGCATGGAT CGCCTTATTC 
1501 TGCATTTGGG TATTTGTTTT GGGTCTGCCC GTCGGTCCCG GCACACCCAC 
1551 ATTCTATCCG GTGCCTTAA 

This encodes a protein having amino acid sequence <SEQ ID 140>: 

1 MSQTDARRSG RFLRTVEWLG NMLPHPVTLF IIFIVLLLIA SAVGAYFGLS 
51 VPDPRPVGAK GRADDG UHV VSLLDADGLI KIL THTVKNF TG FAPLGTVL 

101 VSLLGVGIAE KSGLISALMR LLLTKSPRKL TTFMVVFTGI LSNTASELGY 

151 WLIPLSAVI FHSL GRHPLA GLAAAFAGVS GGYSANLFLG TIDPLLAGIT 

201 QQAAQIIHPD YWGPEANWF FMAASTFVIA LIGYFV TEKI VEPQLGPYQS 

251 DLSQEEKDIR HSNEITPLEY KGLIW AGWF VAL SAL LAWS IV PADGILRH 

301 PETGLVAGSP FLKS IVVFIF LLFALPGIVY G RITRSLRGE REWNAMAES 

351 ■ MST LGLYLVI IFFAAQFVAF FNWTNIGQYI AVKGAVFLKK FRLGGSVLFI 

4 01 GFILICAFIN LMI GSASAQW AVTAPIFVPM LMLAGNAPQV IQAAYRIGDS 

451 VTN IITPMMS YFGLIMATVI KYKKDAGVGT LISMMLPYSA FFLIAWIALF 

501 CIWVFVL GLP VGPGTPTFYP VP* 

ORF12ng shows 97.1% identity in 522 aa overlap with ORE 12-1 : 

10 20 30 40 50 60 

orf 12-1 . pep MSQTDTQRDGRFLRTVEWLGNMLPHPVTLFIIFIVLLLIASAVGAYFGLSVPDPRPVGAK 
I I I I I : : I : I I I I I I I I I I I I I I I I I I I I 1 I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl2ng MSQTDARRSGRFLRTVEWLGNMLPHPVTLFIIFIVLLLIASAVGAYFGLSVPDPRPVGAK 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 12-1 . pep GRADDGLIYIVSLLNADGFIKILTHTVKNFTGFAPLGTVLVSLLGVGIAEKSGLISALMR 
I I I I I I I I : : I I I I : I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl2ng GRADDGLIHWSLLDADGLIKILTHTVKNFTGFAPLGTVLVSLLGVGIAEKSGLISALMR 

70 80 90 100 110 120 



orfl2-l .pep 



130 140 150 160 170 180 

LLLTKSPRKLTTFMWFTGILSNTASELGYWLIPLSAIIFHSLGRHPLAGLAAAFAGVS 
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I I I I ! I I I I I I I i I I I I I I I I I I I I I I I M I I I I I I I I : I I I I I I I I I I I 

nrf1 ?ncf LLLT KSPRKLTTFMWFTGILSNTASELGYWLIPLSAVIFHSLGRHPLAGLAAAFAGVS 
9 130 140 150 160 170 180 

190 200 210 220 230 240 

orfl2-l pep GGYSANLFLGTIDPLLAGITQQAAQIIHPDYWGPEANWFFMVASTFVIALIGYFVTEKI 

1 I 1 I | t ] 1 ! I I 1 i S t 1 I 1 1 1 I I 1 I i i 1 I I 1 1 I 1 i ! S I 1 I I I I : I I 1 I I I I I I I I I i 1 1 I I 
orfl2na ggysanlflgtidpllagitqqaaqiihpdywgpeanwffmaastfvialigyfvteki 
190 200 210 220 230 240 

250 260 270 280 290 300 

orf 12-1 pep vepqlgpyqsdlsqeekdirhsneitpleykgliwagwfvalsallawsivpadgilrh 
M I M I I I I I I II I I I I I I I I M I I I I I I I II II II II I II I I I I I I 1 I I I I I I I I I I I I 

orf 12ng VEPQLGPYQSDLSQEEKDIRHSNEITPLEYKGLIWAGWFVALSALLAWSIVPADGILRH 
250 260 270 280 290 300 

310 320 330 340 350 360 

orf 12-1 pep PETGLVSGSPFLKSIWFIFLLFALPGIVYGRVTRSLRGE QEWNAMAESMSTLGLYLVI 

I | I I I 1 : I I I I I I I I I I I I I I I I I I I I II I I I = I I I M M : I I I II I I I I I I 1 

orfl2ng PETGLVAGSPFLKSIWFIFLLFALPGIVYGRITRSLRGEREWNAMAESMSTLGLYLVI 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 12-1 pep IFFAAQFVAFFNWTNIGQYIAVKGATFLKEVGLGGSVLFIGFILICAFINLMIGSASAQW 
I | | | | | I I I I I II M I I i I I I I I I I : I I I I II I I I I I I I I I I I I I I I I I M M I I I I I I I 
orfl2ng IFFAAQFVAFFNWTNIGQYIAVKGAVFLKEVGLGGSVLFIGFILICAFINLMIGSASAQW 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf 12-1 pep AVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKYKKDAGVGT 
[ M I I I I I I I I I M I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I II M I I I I I I I I I 
orf 12ng AVTAPIFVPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKYKKDAGVGT 

430 440 450 460 470 480 



orfl2-l.pep 
orfl2ng 



I I I I I I I I I I I I I : I 



40 In addition, ORF 1 2ng shows significant homology with a hypotehtical protein from E. coli: 

sp|P4 6133|YDAH_ECOLI HYPOTHETICAL 55.1 KD PROTEIN IN OGT-DBPA INTERGENIC REGION 
>gi I 1787597 (AE000231) hypothetical protein in ogt 5 'region [Escherichia coli] 
Length = 510 
Score = 329 bits (835), Expect = 2e-89 
45 Identities = 178/507 (35%), Positives = 281/507 (55%), Gaps = 15/507 (2%) 

RSGRFLRTVEWLGNMLPHPVTXXXXXXXXXXXASAVGAYFGLSVPDPRPVGAKGRADDGL 67 
+SG+ VE +GN +PHP +A+ + FG+S +P D 
QSGKLYGWVERIGNKVPHPFLLFIYLIIVLMVTTAILSAFGVSAKNP TDGTP 64 



50 



Query: 


8 


Sbjct: 


13 




68 


Sb j ct : 


65 




128 


Sb j ct : 


125 


Query: 


188 


Sbjct: 


185 


Query: 


248 


Sbjct: 


245 



IHVVSLLDADGLIKILTHTVKN FTGFAPXXXXXXXXXXXXIAEKSGLISALMRLLLTKSP 127 
+ V +LL +GL L + +KNF+GFAP +AE+ GL+ ALM + + 

VWKNLLSVEGLHWFLPNVIKNFSGFAPLGAILALVLGAGLAERVGLLPALMVKMASHVN 12 4 



S+ +S+ V++ P+ A+IF ++GRHP+AGL AA AGV G++ANL 



FLGTIDPLLAGITQQAAQIIHPDYWGPEANWFFMAASTFVIALIGYFVTEKIVEPQLGP 24 7 
+ T D LL+GI+ +AA +P V NW+FMA+S V+ ++G +T+KI+EP+LG 



+P +GILR P 



Query: 308 G S P FLKS IWFIFLL FALPGIVYGRI TRS LRGERE WNAMAESMSTLGLYLXXXXXXXXX 367 
SPF+K IV I L F + + YG TR++R + ++ + M E M + ++ 
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Sbjct: 299 PSPFIKGIVPLIILFFFWSLAYGIATRTIRRQADLPHLMIEPMKEMAGFIVMVFPLAQF 358 

Query 368 XXXXNWTNIGQYIAVKGAVFLKEVGLGGSVLFIGFILICAFINLMIGSASAQWAVTAPIF 427 

NW+N +G++IAV L+ GL G F+G L+ +F+ + I S SA W++ APIF 

Sbjct: 359 VAMFNWSNMGKFI AVGLTD I LES SGLSGI PAFVGLALLS S FLCMFIASGSAIWS I LAP I F 418 

Query 428 VPMLMLAGYAPEVIQAAYRIGDSVTNIITPMMSYFGLIMATVIKYKKDAGVGTLISMMLP 487 

VPM ML G+ P Q +RI DS + P+ + L + + +YK DA +GT S+-+LP 
Sbjct: 419 VPMFMLLGFHPAFAQILFRIADSSVLPLAPVSPFVPLFLGFLQRYKPDAKLGTYYSLVLP 478 

Query: 488 YSAFFLIAWIALFCIWVFVLGLPVGPG 514 

Y FL+ W+ + W +++GLP+GPG 
Sbjct: 47 9 YPLIFLWWLLMLLAW— YLVGLPIGPG 504 

Based on this analysis, including the presence of several putative transmembrane domains and the 
predicted actinin-type actin-binding domain signature (shown in bold) in the gonococcal protein, 
it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 17 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 141>: 

1 . . ACAGCCGGCG CAGCAGGTTn CnCGGTCTTC GTTTTCGTAA CGGACAGTCA 

51 GGTGGAGGTG TTCGGGAACA TCCAGACCGC AGT GGAAACA GGTTTTTTTC 

101 ATGGCATTTC GGTTTCGTCT GTGTTTGGTG CGGCGGCACA AGACTCGGCA 

151 ATgGCTTCGC GCAGTGCGTC TATACCGGTA TTTTCAGCAA CGGAAATGCG 

2 01 GACGGcGgCA ATTTTTCCCG CAGCGTCGCG CCATATGCCC GTGTTTTgTT 

251 CTTCAGACGG CAGCAGGTCG GTTTTGTTGT ACACCTTgAT GCACGGAaTA 

301 TCGCCGGCAT GGATTTCTTG CAGTACGTTT TCCACGTCTT CAATCTGCTG 

351 TCCGCTGTTC GGAGCGGCGG CATCGACGAC GTGCAGCAGC ACATCgGcTT 

4 01 gCGCGGTTTC TTCCAGCGTG GCgGAAAAGG CGGAAAT C AG TTTgTGCGGC 

4 51 agATyGCTnA CGAATCCGAC GGTATCGGTC AGGATAATGC TGCATTCGGG 

501 ACT. . 

This corresponds to the amino acid sequence <SEQ ID 142; ORF14>: 

1 . . TAGAAGXXVF VFVTDSQVEV FGNIQTAVET GFFHGISVSS VFGAAAQDSA 

51 MASRSASIPV FSATEMRTAA I FPAASRHMP VFCSSDGSRS VLLYTLMHGI 

101 SPAWISCSTF STSSICCPLF GAAASTTCSS TSACAVSSSV AEKAEISLCG 

151 RXLTNPTVSV RIMLHSG . . 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF14 shows 94.0% identity over a 167aa overlap with an ORF (ORF14a) from strain A of N. 
meningitidis: 

10 20 30 

orf 14 .pep TAGAAGXXVFVFVTDSQVEVFGNIQTAVET 

I : I I I I I I I I I I : I : : I I : 
orf 14a GRQLGFLRVGGALFVITAQARVNNALCDCLTTGAAGFAVFVFVTDGQMQVFGNVQPAVET 
150 160 170 180 190 200 

40 50 60 70 80 90 

orf 14 . pep GFFHGISVSSVFGAAAQDSAMASRSASIPVFSATEMRTAAIFPAASRHMPVFCSSDGSRS 
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100 110 120 130 140 150 

orfl4 oep V LLYTLMHGISPAWISCSTFSTSSICCPLFGAAASTTCSSTSACAVSSSVAEKAEISLCG 

I I I I I I I I ! I I I I I i I 1 1 I I I I I I I I I I I 1 M I I I I I 1 I 1 I I I 1 I N I I 1 I i I I I 1 I I I I 
orfl4a VLLYTLMHGISPAWISCSTFSTSSICCPLFGAAASTTCSSTSACAVSSSVAEKAEISLCG 
270 280 290 300 310 320 

160 

orfl4.pep RXLTNPTVSVRIMLHSG 
I I I I I I II I I I I I I I I 

orfl4a RSLTNPTVSVRIMLHSGLMYSRRAWSSVAKSWSFAYMPDLVSRLNRLDLPTLVX 
330 340 350 360 370 380 

The complete length ORF14a nucleotide sequence <SEQ ID 143> is: 

1 ATGGAGGATT TGCAGGAAAT CGGGTTCGAT GTCGCCGCCG TAAAGGTAGG 

51 TCGGCAGCGC G AAC AT CAT C GTCTGCATCA TCCCCAGCCC GGCAACGGCG 

101 AGGCGGACGA TGTATTGTTT GCGTTCTTTT TGGTTGGCGG CTTCGATTTT 

151 TTGCGCGTCA TAGGGTGCGG CGGTGTAGCC TATCTGCCTG ATTTTCAACA 

201 GAATGTCGGA AAGGCGGATT TTGCCGTCGT CCCAGACGAC GCGGCAGCGG 

251 TGCGTGCTGT AATTGAGGTC GATGCGGACG ATGCCGTCTG TACGCAAAAG 

301 CTGCTGTTCG ATCAGCCAGA CGCAGGCGGC GCAGGTGATG CCGCCGAGCA 

351 TTAAAACCGC CTCGCGCGTG CCGCCGTGGG TTTCCACAAA GTCGGACTGG 

401 ACTTCGGGCA GGTCGTACAG GCGGATTTGG TCGAGGATTT CTTGGGGCGG 

451 CAGCTCGGTT TTTTGCGCGT CGGCGGTGCG TTGTTTGTAA TAACTGCCCA 

501 AGCCCGCGTC AATAATGCTT TGTGCGACTG CCTGACAACC GGCGCAGCAG 

551 GTTTCGCGGT CTTCGTTTTC GTAACGGACG GTCAGATGCA GGTTTTCGGG 

601 AACGTCCAGC CCGCAGTGGA AACAGGTTTT TTTCATGGCA TTTCGGTTTC 

651 GTCTGTGTTT GGTGCGGCGG CACAATACTC GGCAATGGCT TCGCGCAGTG 

701 CGTCTATACC GGTATTTTCA GCAACGGAAA TGCGGACGGC GGCAATTTTT 

751 CCCGCAGCGT CGCGCCATAT GCCCGTGTTT TGTTCTTCAG ACGGCAGCAG 

801 GTCGGTTTTG TTGTACACCT TGATGCACGG AATATCGCCG GCATGGATTT 

851 CTTGCAGTAC GTTTTCCACG TCTTCAATCT GCTGTCCGCT GTTCGGAGCG 

901 GCGGCATCGA CGACGTGCAG CAGCACATCG GCTTGCGCGG TTTCTTCCAG 

951 CGTGGCGGAA AAGGCGGAAA TCAGTTTGTG CGGCAGATCG CTGACGAATC 

1001 CGACGGT AT C GGTCAGGATA ATGCTGCATT CGGGACTGAT GTACAGCCGC 

1051 CGCGCCGTCG TGTCGAGTGT GGCGAAAAGC TGGTCTTTCG CATATATGCC 

1101 CGACTTGGTC AGCCGGTTGA AC AGACT GGA TTTGCCGACA TTGGTATAG 

This encodes a protein having amino acid sequence <SEQ ID 144>: 

1 MEDLQEIGFD VAAVKVGRQR EHHRLHHPQP GNGEADDVLF AFFLVGGFDF 

51 LRVIGCGGVA YLPDFQQNVG KADFAWPDD AAAVRAVIEV DADDAVCTQK 

101 LLFDQPDAGG AGDAAEH*NR LARAAVGFHK VGLDFGQWQ ADLVEDFLGR 

151 QLGFLRVGGA LFVITAQARV NNALCDCLTT GAAGFAVFVF VTDGQMQVFG 

201 NVQPAVETGF FHGISVSSVF GAAAQYSAMA SRSASIPVFS ATEMRTAAIF 

251 PAASRHMPVF CSSDGSRSVL LYTLMHGISP AWISCSTFST SSICCPLFGA 

301 AASTTCSSTS ACAVSSSVAE KAEISLCGRS LTNPTVSVRI MLHSGLMYSR 

351 RAVVSSVAKS WSFAYMPDLV SRLNRLDLPT LV* 

It should be noted that this sequence includes a stop codon at position 118. 
Homology with a predicted QRF from N. gonorrhoeae 

ORF14 shows 89.8% identity over a 167aa overlap with a predicted ORF (ORF14.ng) from N. 
gonorrhoeae: 

orfl4.pep TAGAAGXXVFVFVTDSQVEVFGNIQTAVET 30 

II III I I : I I : I : I : : I I I I : I MM 
orfl4ng GRQFGFFRVGGAS FVITAQAGI DDALCDCLTADAAGFAVFAFVADGQMQVFGNVQPAVET 208 

orfl4 .pep GFFHGISVSSVFGAAAQDSAMASRSASIPVFSATEMRTAAIFPAASRHMPVFCSSDGSRS 90 

I I I I I I I I I I I I M I I I I II I I I I I I I I I I M II I I I I I I I I I I I I I I I I I II M II I I 
orfl4ng GFFHGISVSSVFGAAAQYSAMASRSASIPVFSATEMRTAAIFPAASRHMPVFCSSDGSRS 2 68 

orfl4 .pep VLLYTLMHGISPAWISCSTFSTSSICCPLFGAAASTTCSSTSACAVSSSVAEKAEISLCG 150 

I I II II I II II I I I I I I I I I I I I I I II I I I I I I I I II : I I I : I I I I I I I II I I 

orfl4ng VLLYTLMHGISWAWISCSTFSTSSICCPLFRAAASTTCSSTSACTVSSKVAEKAEISLCG 328 
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orfl4.pep RXLTNPTVSVRIMLHSG lb 1 

I I I I I I I I I I I I I I : I 

orfl4ng RSLTNPTVSVRIMLHAGLMYSRRAVVSRVAKSWSFAYMPDLVSRLNRLDLPTLV 382 

The complete length ORF14ng nucleotide sequence <SEQ ID 145> is predicted to encode a protein 
having amino acid sequence <SEQ ID 146>: 

1 MEDLQEIGFD VAAVKVGRQR EHHRLHHTQS GNGKADD VLF AFFLVGGFDF 

51 LRVI GCGGVA CLPDFQQNVG EADFAWPDD AAAVRAVIEV DADDAVCAQK 

101 LLFDQPDAGG AGNAAEHQHC FVRAIMGFHK VGLDFGQWQ ADLVEDFLGR 

151 QFGFFRVGGA SFVITAQAGI DDALCDCLTA DAAGFAVFAF VADGQMQVFG 

201 NVQPAVETGF FHGISVSSVF GAAAQYSAMA SRSASIPVFS ATEMRTAAIF 

251 PAASRHMPVF CSSDGSRSVL LYTLMHGISW AWISCSTFST SSICCPLFRA 

301 AASTTCSSTS ACTVSSKVAE KAEISLCGRS LTNPTVSVRI MLHAGLMYSR 

351 RAWSRVAKS WSFAYMPDLV SRLNRLDLPT LV* 

Based on the putative transmembrane domain in the gonococcal protein, it is predicted that the 
proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for 
vaccines or diagnostics, or for raising antibodies. 



Example 18 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 147>: 

1 . . GGCCATTACT CCGACCGCAC TTGGAAGCCG CGTTTGGNCG GCCGCCGTCT 

51 GCCGTATCTG CTTTATGGCA CGCTGATTGC GGTTATTGTG ATGATTTTGA 

101 TGCCGAACTC GGGCAGCTTC GGTTTCGGCT ATGCGTCGCT GGCGGCTTTG 

151 TCGTTCGGCG CGCT GATGAT TGCGCTGTTA GACGTGTCGT CAAATATGGC 

201 GATGCAGCCG TTTAAGATGA TGGTCGGCGA CATGGTCAAC GAGGAGCAGA 

251 AAA . NTACGC CTACGGGATT CAAAGTTTCT TAGCAAATAC GGGCGCGGTC 

301 GTGGCGGCGA TTCTGCCGTT TGTGTTTGCG TATATCGGTT TGGCGAACAC 

351 CGCCGANAAA GGCGTTGTGC CGCAGACCGT GGTCGTGGCG TTTTATGTGG 

4 01 GTGCGGCGTT GCTGGTGATT ACCAGCGCGT TCACGATTTT CAAAGTGAAG 

4 51 GAATACGANC CGGAAACCTA CGCCCGTTAC CACGGCATCG ATGTCGCCGC 

501 GAAT CAGGAA AAAGCCAACT GGATCGCACT CTTAAAA.CC GCGC . . 

This corresponds to the amino acid sequence <SEQ ID 148; ORF16>: 



1 . . GHYSDRTWKP RLXGRRLPYL LYGTLIAVIV MILMPNSGSF GFGYASLAAL 

51 S FGALM I ALL DVSSNMAMQP FKMMVGDMVN EEQKXYAYGI QSFLANTGAV 

101 VAAILPFVFA YIGLANTAXK GWPQTVWA FYVGAALLVI TSAFTIFKVK 

151 EYXPETYARY HGIDVAANQE KANWIALLKX A. . 

Further work revealed the complete nucleotide sequence <SEQ ID 149>: 



1 ATGTCGGAAT ATACGCCTCA AACAGCAAAA CAAGGTTTGC CCGCGCTGGC 

51 AAAAAGCACG ATTTGGATGC TCAGTTTCGG CTTTCTCGGC GTTCAGACGG 

101 CCTTTACCCT GCAAAGCTCG CAAATGAGCC GCATTTTTCA AACGCTAGGC 

151 GCAGACCCGC ACAATTTGGG CTGGTTTTTC ATCCTGCCGC CGCTGGCGGG 

201 GATGCTGGTG CAGCCGATTG TCGGCCATTA CTCCGACCGC ACTTGGAAGC 

251 CGCGTTTGGG CGGCCGCCGT CTGCCGTATC TGCTTTATGG CACGCTGATT 

301 GCGGTTATTG TGATGATTTT GATGCCGAAC TCGGGCAGCT TCGGTTTCGG 

351 CTATGCGTCG CTGGCGGCTT TGTCGTTCGG CGCGCTGATG ATTGCGCTGT 

4 01 TAGACGTGTC GTCAAATATG GCGATGCAGC CGTTTAAGAT GATGGTCGGC 

4 51 GACATGGTCA ACGAGGAGCA GAAAGGCTAC GCCTACGGGA TTCAAAGTTT 

501 CTTAGCAAAT ACGGGCGCGG TCGTGGCGGC GATTCTGCCG TTTGTGTTTG 

551 CGT AT AT CGG TTTGGCGAAC ACCGCCGAGA AAGGCGTTGT GCCGCAGACC 

601 GTGGTCGTGG CGTTTTATGT GGGTGCGGCG TTGCTGGTGA TTACCAGCGC 

651 GTTCACGATT TTCAAAGTGA AGGAATACGA TCCGGAAACC TACGCCCGTT 

7 01 ACCACGGCAT CGATGTCGCC GCGAATCAGG AAAAAGCCAA CTGGATCGAA 

7 51 CTCTTGAAAA CCGCGCCTAA GGCGTTTTGG ACGGTTACTT TGGTGCAATT 

801 CTTCTGCTGG TTCGCCTTCC AATATATGTG GACT TACT CG GCAGGCGCGA 

851 TTGCGGAAAA CGTCTGGCAC ACCACCGATG CGTCTTCCGT AGGTTATCAG 

901 GAGGCGGGTA ACT GGTACGG CGTTTTGGCG GCGGTGCAGT CGGTTGCGGC 
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951 GGTGATTTGT TCGTTTGTAT TGGCGAAAGT GCCG7AATAAA TACCATAAGG 

1001 CGGGTTATTT CGGCTGTTTG GCTTTGGGCG CGCTCGGCTT TTTCTCCGTT 

1051 TTCTTCATCG GCAACCAATA CGCGCTGGTG TTGTCTTATA CCTTAATCGG 

1101 CATCGCTTGG GCGGGCATTA TCACTTATCC GCTGACGATT GTGACCAACG 

1151 CCTTGTCGGG CAAGCATATG GGCACTTACT TGGGCTTGTT TAACGGCTCT 

1201 AT CT GT AT GC CTCAAATCGT CGCTTCGCTG TTGAGTTTCG TGCTTTTCCC 

1251 TATGCTGGGC GGCTTGCAGG CCACTATGTT CTTGGTAGGG GGCGTCGTCC 

1301 TGCTGCTGGG CGCGTTTTCC GTGTTCCTGA TTAAAGAAAC ACACGGCGGG 

1351 GTTTGA 



This corresponds to the amino acid sequence <SEQ ID 150; ORF16-l>: 



QMSRIFQTLG 
LPYLLYGTLI 
AMQPFKMMVG 
TAEKGWPQT 
ANQEKANWIE 
TTDASSVGYQ 
ALGALGFFSV 



1 MSEYTPQTAK QGLPALAKST IWMLSFGFLG VQTAFTLQSS 
51 ADPHNLGW FF ILPPLAGMLV QPIVG HYSDR TWKPRLGGRR 
101 AVIVMILMPN SGSFGFGY AS LAALSFGALM IALLDV SSNM 
151 DMVNEEQKGY AYGIQSFLAN TG AWAAILP FVFAYIGLAN 
201 VWAFYVGAA LLVITSA FTI FKVKEYDPET YARYHGIDVA 
251 LLKTAPKAFW TVTLVQFFCW FAFQYMWTYS AGAIAENVWH 
301 EAGNWYG VLA AVQSVAAVIC SFVLA KVPNK YHKAGY FGCL 
351 FFIGNQY ALV LSYTLIGIAW AGII TYPLTI VTNALSGKHM GTYLGLFNGS 
401 ICMPQ IVASL LSFVLFPMLG GL QATMF LVG GWLLLGAFS VFLI KETHGG 
451 V* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted QRF from N. meningitidis (strain A) 

ORF16 shows 96.7% identity over a 181aa overlap with an ORF (ORF16a) from strain A of N. 
meningitidis: 



10 



20 



30 



GHYSDRTWKPRLXGRR LPYLLYGTLIAVIV 
I M I I I I I I I II I ! I ! I I I I I I I II I I I I 
IFQTLGADPHSLGW FFILPPLAGMLVQPIVG HYSDRTWKPRLGGRR LPYLLYGTLIAVIV 
50 60 70 80 90 100 



40 50 60 70 80 90 

orfl6 pep MI L MPNSGS FGFGY AS LAAL S FGALMIALLDV S SNMAMQPFKMMVGDMVNEEQKXYAYGI 
I I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I i M I I I I I I I I 
o r f 1 6 a MILMPN SGSFGFGY AS LAAL S FGALMIALLDV S SNMAMQ PFKMMVGDMVNEEQKGYAYGI 

110 120 130 140 150 160 



100 110 120 130 140 150 

or f 1 6 . pep QSFLANTG AVVAAILPFVFAYIGLA NTAXKGWPQT VWAFYVGAALLVITSA FTIFKVK 
I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I M M II I I I I I I 
orfl6a QSFLAHTG AWAAILPFVFAYIGLA NTAEKGWPQT WVAFYVGAALLVITSA FTIFKVK 
170 180 190 200 210 220 



160 170 180 

EYXPETYARYHGI DVAANQEKANWIALLKXA 
II I I I I I I I I I I M I I I I I I I I I I I I I : I 

EYNPETYARYHGIDVAANQEKANWIELLKTAPKAFWTVTLVQFFCWFAFQYMWTYSAGAI 
230 240 250 260 270 280 



orf 1 6a AENWHTTDASSVGYQEAGNWYG VLAAVQSVAAVICSFVL AKVPNKYHKAGYFGCLALGA 
290 300 310 320 330 340 

The complete length ORF16a nucleotide sequence <SEQ ID 151> is: 



1 AT GT CGGAAT ATACGCCTCA AACAGCAAAA CAAGGTTTGC CCGCGCTGGC 

51 AAAAAGCACG ATTTGGATGC TCAGTTTCGG CTTTCTCGGC GTTCAGACGG 

101 CCTTTACCCT GCAAAGCTCG CAGATGAGCC GCATCTTCCA GACGCT CGGT 

151 GCCGATCCGC ACAGCCTCGG CTGGTTCTTT ATCCTGCCGC CGCTGGCGGG 

2 01 GATGCTGGTG CAGCCGATTG TCGGCCATTA CTCCGACCGC ACTTGGAAGC 

251 CGCGTTTGGG CGGCCGCCGT CTGCCGTATC TGCTTTATGG CACGCTGATT 

301 GCGGTTATTG TGATGATTTT GATGCCGAAC TCGGGCAGCT TCGGTTTCGG 

351 CTATGCGTCG CTGGCGGCTT TGTCGTTCGG CGCGCTGATG ATTGCGCTGT 

4 01 TAGACGTGTC GTCAAATATG GCGATGCAGC CGTTTAAGAT GATGGTCGGC 



CHIR-0160 (356.001) 



-151- 



PATENT 



4 51 GACATGGTCA ACGAGGAGCA GAAAGGCTAC GCCTACGGGA TTCAAAGTTT 

501 CTTAGCGAAT ACGGGCGCGG TCGTGGCGGC GATTCTGCCG TTTGTGTTTG 

551 CGTATATCGG TTTGGCGAAC ACCGCCGAGA AAGGCGTTGT GCCGCAGACC 

601 GTGGTCGTGG CGTTTTATGT GGGTGCGGCG TTGCTGGTGA TTACCAGCGC 

5 651 GTTCACGATT TTCAAAGTGA AGGAATACAA TCCGGAAACC TACGCCCGTT 

7 01 ACCACGGCAT CGATGTCGCC GCGAATCAGG AAAAAGCCAA CTGGATCGAA 

751 CTCTTGAAAA CCGCGCCTAA GGCGTTTTGG ACGGTTACTT TGGTGCAATT 

801 CTTCTGCTGG TTCGCCTTCC AATATATGTG GACTTACTCG GCAGGCGCGA 

851 TTGCGGAAAA CGTCTGGCAC ACCACCGATG CGTCTTCCGT AGGTT AT CAG 

10 901 GAGGCGGGTA ACTGGTACGG CGTTTTGGCG GCGGTGCAGT CGGTTGCGGC 

951 GGTGATTTGT TCGTTTGTAT TGGCGAAAGT GC CGAATAAA TACCATAAGG 

1001 CGGGTTATTT CGGCTGTTTG GCTTTGGGCG CGCTCGGCTT TTTCTCCGTT 

1051 TTCTTCATCG GCAACCAATA CGCGCTGGTG TTGTCTTATA CCTTAATCGG 

1101 CATCGCTTGG GCGGGCATTA TCACTTATCC GCTGACGATT GTGACCAACG 

15 H51 CCTTGTCGGG CAAGCATATG GGCACTTACT TGGGCCTGTT TAACGGCTCT 

1201 ATCTGTATGC CGCAAATCGT CGCTTCGCTG TTGAGTTTCG TGCTTTTCCC 

1251 TATGCTGGGC GGCTTGCAGG CCACTATGTT CTTGGTAGGG GGCGTCGTCC 

1301 TGCTGCTGGG CGCGTTTTCC GTGTTCCTGA TTAAAGAAAC ACACGGCGGG 

1351 GTTTGA 

20 This encodes a protein having amino acid sequence <SEQ ID 152>: 

1 MSEYTPQTAK QGLPALAKST IWMLSFGFLG VQTAFTLQSS QMSRIFQTLG 

51 ADPHSLGW FF ILPPLAGMLV QPIVG HYSDR TWKPRLGGRR LPYLLYGTLI 

101 AVIVMILM PN SGSFGFGY AS LAALSFGALM IALLDV S SNM AMQP FKMMVG 

151 DMVNEEQKGY AYGIQSFLAN TG AWAAILP FVFAYIGLAN TAEKGWPQT 

25 201 VVVAFYVGAA LLVITSA FTI FKVKEYNPET YARYHG I DVA ANQEKANWIE 

251 LLKTAPKAFW TVTLVQFFCW FAFQYMWTYS AGAIAENVWH TTDASSVGYQ 

301 EAGNWYG VLA AVQSVAAVIC SFVLA KVPNK YHKAGY FGCL ALGALGFFSV 

351 FFIGNQY ALV LSYTLIGIAW AGII TYPLTI VTNALSGKHM GTYLGLFNGS 

401 ICMPQ IVASL LSFVLFPMLG GL QATMF LVG GWLLLGAFS VFLI KETHGG 



30 



ORF16a and ORF16-1 show 99.6% identity in 451 aa overlap: 



MSEYTPQTAKQGLPALAKSTIWMLSFGFLGVQTAFTLQSSQMSRIFQTLGADPHSLGWFF 
1 I II I I I I I I I I I I II I I II I I I 1 I I I I I I I I II I I I I I I I I I I I I I I I I I I I I = I I I I I 
MSEYTPQTAKQGLPALAKSTIWMLSFGFLGVQTAFTLQSSQMSRIFQTLGADPHNLGWFF 
10 20 30 40 50 60 

70 80 90 100 110 120 

ILPPLAGMLVQPIVGHYSDRTWKPRLGGRRLPYLLYGTLIAVIVMILMPNSGSFGFGYAS 
I I I I I I I I I I I I I I M I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I 
ILPPLAGMLVQPIVGHYSDRTWKPRLGGRRLPYLLYGTLIAVIVMILMPNSGSFGFGYAS 

70 80 90 100 110 120 

130 140 150 160 170 180 

LAALSFGALMIALLDVSSNMAMQPFKMMVGDMVNEEQKGYAYGIQSFLANTGAVVAAILP 
I I I I 1 I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I M I I I I I I I I I I I I I ! I 1 I 
LAALS FGALMI ALLDVS SNMAMQPFKMMVGDMVNEEQKGYAYG IQS FLANTGAWAAIL P 
130 140 150 160 170 180 

190 200 210 220 230 240 

FVFAY I GLANTAEKGWPQT VWAFYVGAALLVIT SAFT I FKVKE YN PET YARYHGI DVA 

FVFAYIGLANTAEKGWPQTVWAFYVGAALLVITSAFTIFKVKEYDPETYARYHGIDVA 
190 200 210 220 230 240 

250 260 270 280 290 300 

ANQEKANWIELLKTAPKAFWTVTLVQFFCWFAFQYMWTYSAGAIAENWHTTDASSVGYQ 
I I I I I I I I I 1 I I I I I I I I II I I M I I I I I i I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I 
ANQEKANWIELLKTAPKAFWTVTLVQFFCWFAFQYMWTYSAGAIAENVWHTTDASSVGYQ 

250 260 270 280 290 300 

310 320 330 340 350 360 

EAGNWYGVLAAVQSVAAVICSFVLAKVPNKYHKAGYFGCLALGALGFFSVFFIGNQYALV 
I I I I I I I I I I I I I I I I I I I I II I I II I I I I I I I I I I I I I ! M I I I I I I I I I I II II I I I I 
EAGNWYGVLAAVQSVAAVICSFVLAKVPNKYHKAGYFGCLALGALGFFSVFFIGNQYALV 

310 320 330 340 350 360 
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370 380 390 400 410 420 

LSYTLIGIAWAGIITYPLTIVTNALSGKHMGTYLGLFNGSICMPQIVASLLSFVLFPMLG 

M | | | II | | I I I I I i I II I I li I I I I I I I I I I I 1 I I I I I I I M I 1 I I I 1 I I I I I I M II ! 
LSYTLIGIAWAGIITYPLTIVTNALSGKHMGTYLGLFNGSICMPQIVASLLSFVLFPMLG 
370 380 390 400 410 420 



430 440 450 

orf 16a . pep GLQATMFLVGGVVLLLGAFSVFLIKETHGGVX 
I I I I I I I I I II I M I I I I M I I I I I I I I I I I I 
orf 1 6-1 GLQATMFLVGGWLLLGAFSVFLIKETHGGVX 

430 440 450 



Homology with a predicted ORF from N. gonorrhoeae 

ORF 16 shows 93.9% identity over a 181aa overlap with a predicted ORF (ORF16.ng) from N. 
gonorrhoeae: 

orf 16 pep GHYSDRTWKPRLXGRRLPYLLYGTLIAVIV 3 0 

I : I I I I I I I I I I I I I 1 I I I I I I I I I I I I I 
or f 1 6ng HFSNARRRPAQFGLVFHPAAAGGDAGSADSGYYSDRTWKPRLGGRRLPYLLYGTLIAVIV 131 

orf 1 6 . pep MI LMPNSGS FGFGYAS LAAL S FGALMIALLDVS SNMAMQPFKMMVGDMVNEEQKXYAYGI 90 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I MMI 
orf 1 6ng MILMPNSGSFGFGYASLAALSFGALMIALLDVSSNMAMQPFKMMVGDMVNEEQKSYAYGI 1 91 

orf 16 .pep Q S FLANTGAVVAAI L P FVFAY I GLANT AXKGWPQT WVAFYVGAALLVI T S AFT I FKVK 150 

I M I I I I I M I 1 I I I I I II I I I I I I I I I I ! I I I I I I I I I I I I I I I I : I I 11 M I III 
orfl6ng QSFLANTDAVVAAILPFVFAYIGLANTAEKGWPQTWVAFYVGAALLIITSAFTISKVK 2 51 

orf 16. pep EYXPETYARYHGIDVAANQEKANWIALLKXA 181 

II I I I I I I I I I I I I I M I I I I I I : M I : I 

or f 1 6ng EYDPETYARYHGIDVAANQEKANWFELLKTAPKVFWTVTPVQFFCWFAFRYMWTYSAGAI 311 

The complete length ORF16ng nucleotide sequence <SEQ ID 153> is: 



1 ATGATAGGGG ATCGCCGCGC CGGCAACCAT TTCGGATTTT CCAAAGCAAA 

51 TACTTTTCAA ATCAAAAAAA AGGATTTACT TTATGTCGGA ATATACGCCT 

101 CAAACAGCAA AACAAGGTTT GCCCGCGCCG GCAAAAAGCA CGATTTGGAT 

151 GTTGAGCTTC GGCTATCTCG GCGTTCAGAC GGCCTTTACC CTGCAAAGCT 

201 CGCAGATGAG CCGCATTTTT CAAACGCTAG GCGCAGACCC GCACAATTTG 

251 GGCTGGTTTT TCATCCTGCC GCCGCTGGCG GGGATGCTGG TTCAGCCGAT 

3 01 AGTGGCTACT ACTCAGACCG CACTTGGAAG CCGCGCTTGG GCGGCCGCCG 
351 CCTGCCGTAT CTGCTTTACG GCACGCTGAT TGCGGTCATC GTGATGATTT 

4 01 TGATGCCGAA CTCGGGCAGC TTCGGTTTCG GCTATGCGTC GCTGGCGGCC 
4 51 TTGTCGTTCG GCGCGCT GAT GATTGCGCTG TTGGACGTGT CGTCGAATAT 
501 GGCGATGCAG CCGTTTAAGA TGATGGTCGG CGATATGGTC AACGAGGAGC 
551 AGAAAAGCTA CGCCTACGGG ATTCAAAGTT TCTTAGCGAA TACGGACGCG 
601 GTTGTGGCAG CGATTCTGCC GTTTGTGTTC GCGTATATCG GTTTGGCGAA 
651 CACTGCCGAG AAAGGCGTTG TGCCACAAAC CGTGGTCGTA GCATTCTATG 
7 01 TGGGTGCGGC GTTACTGATT ATTACCAGTG CGTTCACAAT CTCCAAAGTC 

7 51 AAAGAATACG ACCCGGAAAC CTACGCCCGT TACCACGGCA TCGATGTCGC 

8 01 CGCGAATCAG GAAAAAGCCA ACTGGTTCGA ACTCTTAAAA ACCGCGCCTA 
8 51 AAGTGTTTTG GACGGTTACT CCGGTACAGT TTTTCTGCTG GTTCGCCTTC 
901 CGGTATATGT GGACTTACTC GGCAGGCGCG ATTGCAGAAA ACGTCTGGCA 
951 CACTACCGAT GCGTCTTCCG TAGGCCATCA GGAGGCGGGC AACCGGTACG 

1001 GCGTTTTGGC GGCGGTGTAG 

This encodes a protein having amino acid sequence <SEQ ID 154>: 



1 M I GDRRAGNH FGFSKANTFQ IKKKDLLYVG IYASNSKTRF ARAGKKHDLD 

51 VELRLSRRSD GLYPAKLADE PHFSNARRRP AQFGLVFHPA AAGGDAGSAD 

101 SGYYSDRTWK PRLGGRR LPY LLYGTLIAVI VMIL MPNSGS FGFGY ASLAA 

151 LSFGALMIAL LDV SSNMAMQ PFKMMVGDMV NEEQKSYAYG IQSFLANTDA 

201 WAAILPFVF AYIGLAN TAE KGVVPQT WV AFYVGAALLI ITSA FTISKV 

2 51 KEYDPETYAR YHG I DVAANQ EKANWFELLK TAPKVFWTVT PVQFFCWFAF 

301 RYMWTYSAGA IAENVWHTTD ASSVGHQEAG NRYGVLAAV* 
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ORF16ng and ORF16-1 show 89.3% identity in 261 aa overlap: 



25 



orf 16-1 -pep 



orf 16-1 .pep 



orf 16-1. pep 
orf 16ng 



30 



50 



60 



70 



MLSFGFLGVQTAFTLQSSQMSRIFQTLGADPHNLGWFFILPPLAGMLVQPI-VGHYSDRT 

I : : I I I II : I : I I I I I 

DVELRLSRRSDGLYPAKLADEPHFSNARRRPAQFGLVF-HPAAAGGDAGSADSGYYSDRT 
50 60 70 80 90 100 

90 100 110 120 130 140 

WKPRLGGRRLPYLLYGTLIAVIVMILMPNSGSFGFGYASLAALSFGALMIALLDVSSNMA 
| I M I I I I I M I I I I I M I I I I I I I II I M I I I I I M I I I II I I I I I I I I I I I I I M I I I 
WKPRLGGRRLPYLLYGTLIAVIVMILMPNSGSFGFGYASLAALSFGALMIALLDVSSNM7A 
110 120 130 140 150 160 

150 160 170 180 190 200 

MQPFKMMVGDMVNEEQKGYAYGIQSFLANTGAWAAILPFVFAYIGLANTAEKGVVPQTV 

I | | | || I I I I I I I I I I I : I M I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 
MQPFPCMMVGDMVNEEQKSYAYGIQSFLANTDAWAAILPFVFAYIGLANTAEKGWPQTV 
170 180 190 200 210 220 

210 220 230 240 250 260 

VVAFYVGAALLVITSAFTIFKVKEYDPETYARYHGIDVAANQEKANWIELLKTAPPCAFWT 
I || I II I I I I I = I M I I I I I I I II I I I I I I II I I I I I I I I I I I I I I : I I I I I I I I : I I I 
WAFYVGAALLIITSAFTISKVKEYDPETYARYHGIDVAANQEKANWFELLKTAPKVFWT 
230 240 250 260 270 280 

270 280 290 300 310 320 

VTLVQFFCWFAFQYMWTYSAGAIAENVWHTTDASSVGYQEAGNWYGVLAAVQSVAAVICS 
I I I I I II I I I I : I M I I I I I I I I I I I I I I I I I I I I I = I I I I I I I I I I I I 
VTPVQFFCWFAFRYMWTYSAGAIAENVWHTTDASSVGHQEAGNRYGVLAAVX 
290 300 310 320 330 340 



Based on this analysis, including the presence of several putative transmembrane domains in the 
gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



35 Example 19 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 155>: 

1 ATGTTGTTCC GTAAAACGAC CGCCGCCGTT TTGGCGCATA CCTTGATGCT 

51 GAACGGCTGT ACGTTGATGT TGTGGGGAAT GAACAACCCG GTCAGCGAAA 

101 CAATCACCCG NAAACACGTT GNCAAAGACC AAATCCGNGN CTTCGGTGTG 

40 151 GTTGCCGAAG ACAATGCCCA AT TGGAAAAG GGCAGCCTGG TGATGATGGG 

2 01 CGGAAAATAC TGGTTCGTCG TCAATCCCGA AGATTCGGCG AA.NTGACGG 

251 GNATTTTGAN GGCAGGGCTG GACAAACCCT TCCAAATAGT TNAGGATACC 

301 CCGAGCTATG C . TGCCACCA AGCCCTGCCG GTCAAACTCG GATCGNCTGG 

351 CAGCCAGAAT . . . 

45 This corresponds to the amino acid sequence <SEQ ID 156; ORF28>: 

1 MLFRKTTAAV LAHTLMLNGC TLMLWGMNNP VSETITRKHV XKDQIRXFGV 

51 VAEDNAQLEK GSLVMMGGKY WFWNPEDSA XXTGILXAGL DKPFQIVXDT 

101 PSYXCHQALP VKLGSXGSQN. . . 

Further work revealed the complete nucleotide sequence <SEQ ED 157>: 

50 1 ATGTTGTTCC GTAAAACGAC CGCCGCCGTT TTGGCGGCAA CCTTGATGCT 

51 GAACGGCTGT ACGTTGATGT TGTGGGGAAT GAACAACCCG GTCAGCGAAA 

101 CAATCACCCG CAAACACGTT GACAAAGACC AAATCCGCGC CTTCGGTGTG 

151 GTTGCCGAAG ACAATGCCCA ATTGGAAAAG GGCAGCCTGG TGATGATGGG 

201 CGGAAAATAC TGGTTCGTCG TCAATCCCGA AGATTCGGCG AAGCTGACGG 

55 251 GCATTTTGAA GGCAGGGCTG GACAAACCCT TCCAAATAGT TGAGGATACC 
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301 CCGAGCTATG CTCGCCACCA AGCCCTGCCG GTCAAACTCG AATCGCCTGG 

351 CAGCCAGAAT TTCAGTACCG AAGGCCTTTG CCTGCGCTAC GATACCGACA 

401 AGCCTGCCGA CATCGCCAAG CTGAAACAGC TCGGGTTTGA AGCGGTCAAA 

451 CTCGACAATC GGACCATTTA CACGCGCTGC GTATCCGCCA AAGGCAAATA 

501 CTACGCCACA CCGCAAAAAC TGAACGCCGA TTACCATTTT GAGCAAAGTG 

551 TGCCTGCCGA TATTTATTAC ACGGTTACTG AAGAACATAC CGACAAATCC 

601 AAGCTGTTTG CAAATATCTT ATATACGCCC CCCTTTTTGA TACTGGATGC 

651 GGCGGGCGCG GTACTGGCCT TGCCTGCGGC GGCTCTGGGT GCGGTCGTGG 

7 01 ATGCCGCCCG CAAATGA 

This corresponds to the amino acid sequence <SEQ ID 158; ORF28-l>: 

1 MLFRKTTAAV LAATLMLNG C TLMLWGMNNP VSETITRKHV DKDQIRAFGV 

51 VAEDNAQLEK GSLVMMGGKY WFWNPEDSA KLTGILKAGL DKPFQIVEDT 

101 PSYARHQALP VKLESPGSQN FSTEGLCLRY DTDKPADIAK LKQLGFEAVK 

151 LDNRTIYTRC VSAKGKYYAT PQKLNADYHF EQSVPADIYY TVTEEHTDKS 

201 KLFANILYTP PF LILDAAGA VLAL PAAAL G AWDAARK* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF28 shows 79.2% identity over a 120aa overlap with an ORF (ORF28a) from strain A of TV. 

meningitidis: 

10 20 30 40 50 60 

orf 2 8 . pep MLFRKTTAAVLAHTLMLNG CTLMLWGMNNPVSETITRKHVXKDQIRXFGWAEDHAQLEK 

I I I I I I I I I I I I I II I I I I I : I : M I I : I III : I I I I I I I I I I I I I I I I I I I II I 
or f 2 8a MLFRKTTAAVLAATLMLNG CTVMMWGMNSPFSETTARKHVDKDQIRAFGWAEDNAQLEK 
10 20 30 40 50 60 



70 80 90 100 110 120 

orf 28 -pep GSLVMMGGKYWFWNPEDSAXXTGILXAGLDKPFQIVXDTPSYXCHQALPVKLGSXGSQN 

I I I I I I I I I I I I I I I I I I II I II I I I I I ! I I : I : I : : I I I I I II 1 : I I I 
orf 2 8a G S LVMMGGKYWFVVN PE D S AKLTG I LKAGLDKQFQMVE PNPRFA- YQAL PVKLE S PAS QN 

70 80 90 100 110 



The complete length ORF28a nucleotide sequence <SEQ ID 159> is: 



1 ATGTTGTTCC GTAAAACGAC CGCCGCCGTT TTGGCGGCAA CCTTGATGTT 

51 GAACGGCTGT ACGGTAATGA TGTGGGGTAT GAACAGCCCG TTCAGCGAAA 

101 CGACCGCCCG CAAACACGTT GACAAGGACC AAATCCGCGC CTTCGGTGTG 

151 GTTGCCGAAG ACAATGCCCA ATTGGAAAAG GGCAGCCTGG TGATGATGGG 

201 CGGGAAATAC TGGTTCGTCG TCAATCCTGA AGATTCGGCG AAGCTGACGG 

251 GCATTTTGAA GGCCGGGTTG GACAAGCAGT TTCAAATGGT TGAGCCCAAC 

301 CCGCGCTTTG CCTACCAAGC CCTGCCGGTC AAACTCGAAT CGCCCGCCAG 

351 CCAGAATTTC AGTACCGAAG GCCTTTGCCT GCGCTACGAT AC CGACAGAC 

4 01 CTGCCGACAT CGCCAAGCTG AAACAGCTTG AGTTTGAAGC GGTCGAACTC 

4 51 GACAATCGGA CCATTTACAC GCGCTGCGTC TCCGCCAAAG GCAAATACTA 

501 CGCCACACCG CAAAAACTGA ACGCCGATTA TCATTTTGAG CAAAGTGTGC 

551 CTGCCGATAT TTATTACACG GTTACGAAAA AACATACCGA CAAATCCAAG 

601 TTGTTTGAAA ATATTGCATA TACGCCCACC ACGTTGATAC TGGATGCGGT 

651 GGGCGCGGTG CTGGCCTTGC CTGTCGCGGC GTTGATTGCA GCCACGAATT 

7 01 C C T C AG AC AA ATGA 

This encodes a protein having amino acid sequence <SEQ ID 160>: 



1 MLFRKTTAAV LAATLMLNG C TVMMWGMNSP FSETTARKHV DKDQIRAFGV 

51 VAEDNAQLEK GSLVMMGGKY WFWNPEDSA KLTGILKAGL DKQFQMVEPN 

101 PRFAYQALPV KLESPASQNF STEGLCLRYD TDRPADIAKL KQLEFEAVEL 

151 DNRTIYTRCV SAKGKYYATP QKLNADYHFE QSVPADIYYT VTKKHTDKSK 

201 LFENIAYTPT TL ILDAVGAV LALPVAALIA ATNSSDK* 



ORF28a and ORF28-1 show 86.1% identity in 238 aa overlap: 
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orf28a pep MLFRKTTAAVLAATLMLNGCTVMMWGMNSPFSETTARKHVDKDQIRAFGWAEDNAQLEK 
M I I I I I I I I I I I I I I I I M I : I : I ! I I : I I I I : I I I I I I I I I I I I I 1 I I 1 I I 1 I 1 I I 
orf28-l MLFRKTTAAVLAATLMLNGCTLMLWGMNNPVSETITRKHVDKDQIRAFGVVAEDNAQLEK 
5 10 20 30 40 50 60 

70 80 90 100 110 119 

orf 28a pep GSLVMMGGKYWFVVNPEDSAKLTGILKAGLDKQFQMVEPNPRFA-YQALPVKLESPASQN 

M I M M I I I I I M I I I I I i I I I I I I I I I I I I I I : I I : I : I : I I I I I I I II I : I I I 

10 orf 2 8-1 GSLVMMGGKYWFWNPEDSAKLTGILKAGLDKPFQIVEDTPSYARHQALPVKLESPGSQN 

70 80 90 100 110 120 

120 130 140 150 160 170 179 

orf 28a . pep FSTEGLCLRYDTDRPADIAKLKQLEFEAVELDNRTIYTRCVSAKGKYYATPQKLNADYHF 
15 I I I I I I I I I I II I = I I I I M I I I! I I I I : I II I I I I I I I I I I I I I I I I I I I I I I M I I I 

orf 28-1 FSTEGLCLRYDTDKPADIAKLKQLGFEAVKLDNRTIYTRCVSAKGKYYATPQKLNADYHF 
130 140 150 160 170 180 

180 190 200 210 220 230 

20 orf 28a . pep EQSVPADIYYTVTKKHTDKSKLFENIAYTPTTLILDAVGAVLALPVAALIAATNSSDKX 

I I I II I I I I I I II : : I I I I 1 I I I II III I II I I : I I I I I I I : I I I I : : : : : II 
orf 28-1 EQSVPADIYYTVTEEHTDKSKLFANILYTPPFLILDAAGAVLALPAAALGAWDAARKX 
190 200 210 220 230 

25 Homology with a predicted ORF from N. gonorrhoeae 

ORF28 shows 84.2% identity over a 120aa overlap with a predicted ORF (ORF28.ng) from N. 
gonorrhoeae: 

orf 28 .pep MLFRKTTAAVLAHTLMLNGCTLMLWGMNNPVSETITRKHVXKDQIRXFGWAEDNAQLEK 60 
I I I I I I I I I I I I I I : I I I I I : I I I I I I I I I : I I I I I II I I I I I I I I I I I I I I I I I I 
30 orf2 8ng MLFRKTTAAVLAATLILNGCTMMLRGMNNPVSQTITRKHVDKDQIRAFGWAEDNAQLEK 60 

orf 28 -pep GSLVMMGGKYWFVVNPEDSAXXTGILXAGLDKPFQIVXDTPSYXCHQALPVKLGSXGSQN 12 0 

I I I I I I I I I I I I : I I I I I I I M : I I I I I I I I I I I I I I I I I I I I I I I : : I I I I 
or f 2 8ng GSLVMMGGKYWFAVNPEDSAKLTGLLKAGLDKPFQIVEDTPSYARHQALPVKFEAPGSQN 12 0 

35 The complete length ORF28ng nucleotide sequence <SEQ ID 1 6 1 > is 

1 ATGTTGTTCC GTAAAACGAC CGCCGCCGTT TTGGCGGCAA CCTTGATACT 

51 GAACGGCTGT AC GAT GAT GT TGCGGGGGAT GAACAACCCG GTCAGCCAAA 

101 CAATCACCCG CAAACACGTT GACAAAGACC AAATCCGCGC CTTCGGTGTG 

151 GTTGCCGAAG ACAATGCCCA ATTGGAAAAG GGCAGCCTGG TGATGATGGG 

40 2 01 CGGGAAATAC TGGTTCGCCG TCAATCCCGA AGATTCGGCG AAGCTGACGG 

251 GCCTTTTGAA GGCCGGGTTG GACAAGCCCT TCCAAATAGT TGAGGATACC 

301 CCGAGCTATG CCCGCCACCA AGCCCTGCCG GTCAAATTCG AAGCGCCCGG 

351 CAGCCAGAAT TTCAGTACCG GAGGTCTTTG CCTGCGCTAT GATACCGGCA 

401 GACCTGACGA CATCGCCAAG CTGAAACAGC TTGAGTTTAA AGCGGTCAAA 

45 4 51 CTCGACAATC GGACCATTTA CACGCGCTGC GTATCCGCCA AAGGCAAATA 

501 CTACGCCACG CCGCAAAAAC TGAACGCCGA TTATCATTTT GAGCAAAGTG 

551 TGCCCGCCGA TAT T TAT TAT ACGGTTACTG AAAAACATAC CGACAAATCC 

601 AAGCTGTTTG GAAATATCTT ATATACGCCC CCCTTGTTGA TATTGGATGC 

651 GGCGGCCGCG GTGCTGGTCT TGCCTATGGC TCTGATTGCA GCCGCGAATT 

50 7 01 CCTCAGACAA ATGA 

This encodes a protein having amino acid sequence <SEQ ID 162>: 

1 ML FRKTTAAV LAATLILNG C TMMLRGMNNP VSQTITRKHV DKDQIRAFGV 

51 VAEDNAQLEK GSLVMMGGKY WFAVNPEDSA KLTGLLKAGL DKPFQIVEDT 

101 PSYARHQALP VKFEAPGSQN FSTGGLCLRY DTGRPDDIAK LKQLEFKAVK 

55 151 LDNRTIYTRC VSAKGKYYAT PQKLNADYHF EQSVPADIYY TVTEKHTDKS 

201 KLFGNILYTP PL LILDAAAA VLVLPMALI A AANSSDK* 

ORF28ng and ORF28-1 share 90.0% identity in 231 aa overlap: 

10 20 30 40 50 60 

orf 2 8-1 . pep MLFRKTTAAVLAATLMLNGCTLMLWGMNNPVSETITRKHVDKDQIRAFGVVAEDNAQLEK 
60 | | | | | | | | | | | | || | : | | | | | : | | I I II I I I : I I I I I I I I I I I I M I I I I I II II I I I I 
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MLFRKTTAAVLAATLILNGCTMMLRGMNNPVSQTITRKHVDKDQIRAFGWAEDNAQLEK 
10 20 30 40 50 60 

, 70 80 90 100 110 120 

GSLVMMGGKYWFWNPEDSAKLTGILKAGLDKPFQIVEDTPSYARHQALPVKLESPGSQN 

| | | | | I I i I I I I : I I I II I I I I I I : I I i I I I I I I I I I I I I I I I I I i M I 1 I I : I : I I I I I 
GSLVMMGGKYWFAVNPEDSAKLTGLLKAGLDKPFQIVEDTPSYARHQALPVKFEAPGSQN 
70 80 90 100 110 120 

130 140 150 160 170 180 

FSTEGLCLRYDTDKPADIAKLKQLGFEAVKLDNRTIYTRCVSAKGKYYATPQKLNADYHF 
Ml I I I I I I II : I I I I I I I I I I : II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
FS TGGLCLRYDTGRPDD I AKLKQLE FKAVKLDNRT I YTRCVS AKGKYYAT PQKLNADYHF 

130 140 150 160 170 180 

190 200 210 220 230 239 

EQSVPADIYYTVTEEHTDKSKLFANILYTPPFLILDAAGAVLALPAAALGAVVDAARKX 
I I I I I I I I I 1 I I I h I I I M I I I : I I I I I M : I I I I I I : I I I : I I I : : I : 

EQSVPADIYYTVTEKHTDKSKLFGNILYTPPLLILDAAAAVLVLPMALIAAANSSDKX 

190 200 210 220 230 



Based on this analysis, including the presence of a putative transmembrane domain in the 
gonococcal protein, it was predicted that the proteins from N.meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



25 ORF28-1 (24kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
6A shows the results of affinity purification of the GST- fusion protein, and Figure 6B shows the 
results of expression of the His-fusion m. E.coli. Purified GST-fusion protein was used to immunise 
mice, whose sera were used for ELISA, which gave a positive result. These experiments confirm 

30 that ORF28-1 is a surface-exposed protein, and that it may be a useful immunogen. 



orf 28-1. pep 

orf28ng 



orf 28-1 .pep 
orf28ng 



orf 28-1 .pep 
orf28ng 



Example 20 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 163>: 

1 . . GTCAGTCCTG TACTGCCTAT TACACACGAA CGGACAGGGT TTGAAGGTGT 

51 TATCGGTTAT GAAACCCATT TTTCAGGGCA CGGACATGAA GTACACAGTC 

35 101 CGTTCGATCA T C AT GAT T C A AAAAGCACTT CTGATTTCAG CGGCGGTGTA 

151 GACGGCGGTT TTACTGTTTA CCAACTTCAT CGAACATGGT CGGAAATCCA 

201 TCCGGAGGAT GAATATGACG GGCCGCAAGC AGCG.ATTAT CCGCCCCCCG 

251 GAGGAGCAAG GGATATATAC AGCTATTATG TCAAAGGAAC TTCAACAAAA 

301 ACAAAGACTA GTATTGTCCC TCAAGCCCCA TTTTCAGACC GTTGGCTAGA 

40 351 AGAAAATGCC GGTGCCGCCT CTGGT . . 

This corresponds to the amino acid sequence <SEQ ID 164; ORF29>: 

1 . .VSPVLPITHE RTGFEGVIGY ETHFSGHGHE VHSPFDHHDS KSTSDFSGGV 
51 DGGFTVYQLH RTWSEIHPED EYDGPQAAXY PPPGGARDIY SYYVKGTSTK 

101 TKTSIVPQAP FSDRWLEENA GAASG. . 

45 Further work revealed the complete nucleotide sequence <SEQ ID 1 65>: 

1 ATGAATTTGC CTATTCAAAA ATTCATGATG CTGTTTGCAG CAGCAATATC 

51 GTTGCTGCAA ATCCCCATTA GTCATGCGAA CGGTTTGGAT GCCCGTTTGC 

101 GCGATGATAT GCAGGCAAAA CACTACGAAC CGGGTGGTAA ATACCATCTG 

151 TTTGGTAATG CTCGCGGCAG TGTTAAAAAG CGGGTTTACG CCGTCCAGAC 
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201 ATTTGATGCA ACTGCGGTCA GTCCTGTACT GCCTATTACA CACGAACGGA 

251 CAGGGTTTGA AG GT GT TAT C GGTTATGAAA CCCATTTTTC AGGGCACGGA 

301 CATGAAGTAC ACAGTCCGTT CGAT CAT CAT GATTCAAAAA GCACTTCTGA 

351 TTTCAGCGGC GGTGTAGACG GCGGTTTTAC TGTTTACCAA CTTCATCGAA 

4 01 CAGGGTCGGA AATCCATCCG GAGGAT GGAT ATGACGGGCC GCAAGGCAGC 

451 GATTATCCGC CCCCCGGAGG AGCAAGGGAT ATATACAGCT ATTATGTCAA 

501 AGGAACTTCA ACAAAAACAA AGACTAATAT TGTCCCTCAA GCCCCATTTT 

551 CAGACCGTTG GCTAAAAGAA AATGCCGGTG CCGCCTCTGG TTTTTTCAGC 

601 CGTGCGGATG AAGCAGGAAA ACTGATATGG GAAAGCGACC CCAATAAAAA 

651 TTGGTGGGCT AACCGTATGG ATGATGTTCG CGGCATCGTC CAAGGTGCGG 

7 01 TTAATCCTTT TTTAATGGGT TTTCAAGGAG TAGGGATTGG GGCAATTACA 
751 GACAGTGCAG TAAGCCCGGT CACAGATACA GCCGCGCAGC AGACTCTACA 

8 01 AGGTATTAAT GATTTAGGAA AATTAAGTCC GGAAGCACAA CTTGCTGCCG 
8 51 CGAGCCTATT ACAGGACAGT GCTTTTGCGG TAAAAGACGG TATCAACTCT 
901 GCCAAACAAT GGGCTGATGC C CAT C C AAAT ATAACAGCTA CTGCCCAAAC 
951 TGCCCTTTCC GCAGCAGAGG CCGCAGGTAC GGTTTGGAGA GGTAAAAAAG 

1001 TAGAACTTAA CCCGACTAAA TGGGATTGGG TTAAAAATAC CG GT TAT AAA 

1051 AAACCTGCTG CCCGCCATAT GCAGACTTTA GATGGGGAGA TGGCAGGTGG 

1101 GAATAAACCT ATTAAATCTT T AC CAAACAG TGCCGCTGAA AAAAGAAAAC 

1151 AAAATTTTGA GAAGTTTAAT AGTAACTGGA GTTCAGCAAG TTTTGATTCA 

1201 GTGCACAAAA CACTAACTCC CAATGCACCT GGTATTTTAA GTCCTGATAA 

1251 AGTTAAAACT CGATACACTA GTTTAGATGG AAAAAT T AC A AT T AT AAAAG 

1301 ATAACGAAAA CAACTATTTT AGAATCCATG ATAATTCACG AAAACAGTAT 

1351 CTTGATTCAA ATGGTAATGC TGTGAAAACC GGTAATTTAC AAGGTAAGCA 

14 01 AG C AAAAG AT TATTTACAAC AACAAACTCA TATCAGGAAC T T AG AC AAAT 

14 51 GA 

This corresponds to the amino acid sequence <SEQ ED 166; ORF29-l>: 



1 MNLPIQKFMM LFAAA1SLLQ IPISHA NGLD ARLRDDMQAK HYEPGGKYHL 

51 FGNARGSVKK RVYAVQT FDA TAVSPVLPIT HERTGFEGVI GYETHFSGHG 

101 HEVHSPFDHH DSKSTSDFSG GVDGGFTVYQ LHRTGSEIHP EDGYDGPQGS 

151 DYPPPGGARD IYSYYVKGTS TKTKTNIVPQ APFSDRWLKE NAGAASGFFS 

2 01 RADEAGKLIW ESDPNKNWWA NRMDDVRGIV QGAVNPFLMG FQGVGIGAIT 

251 DSAVSPVTDT AAQQTLQGIN DLGKLSPEAQ LAAASLLQDS AFAVKDGINS 

301 AKQWADAHPN ITATAQTALS AAEAAGTVWR GKKVELNPTK WDWVKNTGYK 

351 KPAARHMQTL DGEMAGGNKP IKSLPNSAAE KRKQNFEKFN SNWSSASFDS 

401 VHKTLTPNAP GILSPDKVKT RYTSLDGKIT IIKDNENNYF RIHDNSRKQY 

4 51 LDSNGNAVKT GNLQGKQAKD YLQQQTHIRN LDK* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF29 shows 88.0% identity over a 125aa overlap with an ORF (ORF29a) from strain A of N. 
meningitidis: 



10 20 30 

orf 2 9 . pep VSPVLPITHERTGFEGVIGYETHFSGHGHE 

I : I : ! I I I ! I I I I I I I : I I I I I I I I I I I t I 

orf 2 9a EPGGKYHLFGNARGSVKNRVYAVQTFDATAVGPILPITHERTGFEGI IGYETHFSGHGHE 

50 60 70 80 90 100 



40 50 60 70 80 90 

orf 2 9. pep VHSPFDHHDSKSTSDFSGGVDGGFTVYQLHRTWSEIHPEDEYDGPQAAXYPPPGGARDIY 
i I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Mill:: I I I I I I I I M I 
orf 2 9a VHSPFDNHDSKSTSDFSGGVDGGFTVYQLHRTGSEIHPEDGYDGPQGSDYPPPGGARDIY 
110 120 130 140 150 160 



100 110 120 

orf29.pep S YYVKGT STKTKTS IVPQAPFS DRWLEENAGAASG 
I I I I I I I I I I : : I I I : I I II I I I I : I I [ I I I I I 
or f 2 9a XXYVKGT S TKTKSN IV PRAP FS DRWLKENAGAASG FFSRADE AGKL I WE S DPNKNWWANR 

170 180 190 200 210 220 



orf29a 



MDDIRGIVQGAVNPFLMGFQGVGIGAITDSAVSPVTDTAAQQTLQGXNHLGXLSPEAQLA 
230 240 250 260 270 280 
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The complete length ORF29a nucleotide sequence <SEQ ID 167> is: 

1 ATGAATTNGC CTATTCAAAA ATT CAT GAT G CTGTTTGCAG CAGCAATATC 

51 GTNGCTGCAA ATCCCNATTA GTCATGCGAA CGGTTTGGAT GCCCGTTTGC 

101 GCGATGATAT GCAGGCAAAA CACTACGAAC CGGGTGGTAA ATACCATCTG 

151 TTTGGTAATG CTCGCGGCAG TGTTAAAAAT CGGGTTTACG CCGTCCAAAC 

201 ATTTGATGCA ACTGCGGTCG GCCCCATACT GCCTATTACA CACGAACGGA 

251 CAGGATTTGA AGGCATTATC GGTTATGAAA CCCATTTTTC AGGACAT GGA 

301 CATGAAGTAC ACAGTCCGTT CGATAATCAT GAT T C AAAAA GCACTTCTGA 

351 TTTCAGCGGC GGCGTAGACG GTGGTTTTAC CGTTTACCAA CTT CATC GGA 

4 01 CAGGGTCGGA AATCCATCCG GAGGATGGAT ATGACGGGCC GCAAGGCAGC 

451 GATTATCCGC CCCCCGGAGG AGCAAGGGAT ATATACANNT ANTATGTCAA 

501 AGGAACTTCA ACAAAAACAA AGAGTAATAT TGTTCCCCGA GCCCCATTTT 

551 CAGACCGCTG GCTAAAAGAA AATGCCGGTG CCGCCTCTGG TTTTTTCAGC 

601 CGTGCTGATG AAGCAGGAAA ACTGATATGG GAAAGCGACC CCAAT AAAAA 

651 TTGGTGGGCT AACCGTATGG ATGATATTCG CGGCATCGTC CAAGGTGCGG 

7 01 TTAATCCTTT TTTAATGGGT TTTCAAGGAG TAGGGATTGG GGCAATTACA 

751 GACAGTGCAG TAAGCCCGGT CACAGATACA GCCGCGCAGC AGACTCTACA 

801 AGGTATNAAT CATTTAGGAA ANTTAAGTCC CGAAGCACAA CTTGCGGCTG 

851 CAACCGCATT ACAAGACAGT GCTTTTGCGG TAAAAGACGG TATCAATTCC 

901 GCCAGACAAT GGGCTGATGC CCATCCGAAT ATAACTGCAA CAGCCCAAAC 

951 TGCCCTTGCC GT AG CAGAtJG CCGCAACTAC GGTTTGGGGC GGTAAAAAAG 

1001 TAGAACTTAA CCCGACCAAA TGGGATTGGG TTAAAAATAC NGGCTATAAN 

1051 ACACCTGCTG TTCGCACCAT GCATACTTTG GATGGGGAAA TGGCCGGTGG 

1101 GAATAGACCG CCTAAATCTA TAACGTCCAA CAGCAAAGCA GATGCTTCCA 

1151 CACAACCGTC TTTACAAGCG CAACTAATTG GAGAACAAAT TANNNNNGGG 

1201 CATGCTTATA ACAAGCATGT CATAAGACAA CAAGAATTTA CGGATTTAAA 

1251 TATCAATTCA CCAGCAGATT TTGCTCGGCA TATTGAAAAT ATTGTTAGCC 

13 01 ATCCANCAAA TATGAAAGAG TTACCTCGCG GTAGAACTGC GTATTGGGAT 
1351 NATAAAACAG GGACNATAGT TATCCGAGAT AAAAATTCTG ACGATGGAGG 

14 01 TACAGCATTT AGACCAACAT CAGGTAAAAA ATATTATGAT GAT T TAT AG 

This encodes a protein having amino acid sequence <SEQ ID 168>: 

1 MNXPIQKFMM LFAAAI SXLQ IPISHA NGLD ARLRDDMQAK HYEPGGKYHL 

51 FGNARGSVKN RVYAVQTFDA TAVGPILPIT HERTGFEGII GYETHFSGHG 

101 HEVHSPFDNH DSKSTSDFSG GVDGGFTVYQ LHRTGSEIHP EDGYDGPQGS 

151 DYPPPGGARD IYXXYVKGTS TKTKSNIVPR APFSDRWLKE NAGAASGFFS 

2 01 RADEAGKLIW ESDPNKNWWA NRMDDIRGIV QGAVNPFLMG FQGVGIGAIT 
251 DSAVSPVTDT AAQQTLQGXN HLGXLSPEAQ LAAATALQDS AFAVKDGINS 

3 01 ARQWADAHPN ITATAQTALA VAXAATTVWG GKKVELNPTK WDWVKNTGYX 
351 TPAVRTMHTL DGEMAGGNRP PKSITSNSKA DASTQPSLQA QLIGEQIXXG 

4 01 HAYNKHVIRQ QEFTDLNINS PADFARHIEN IVSHPXNMKE LPRGRTAYWD 
451 XKTGTIVIRD KNSDDGGTAF RPTSGKKYYD DL* 

ORF29a and ORF29-1 show 90.1% identity in 385 aa overlap: 

10 20 30 40 50 60 

MNXPIQKFMMLFAAAISXLQIPISHANGLDARLRDDMQAKHYEPGGKYHLFGNARGSVKN 
II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I : 

MNLP I QKFMMLFAAAI S LLQ I P I SHANGLDARLRD DMQAKHYE PGGKYHL FGNARG SVKK 
10 20 30 40 50 60 

70 80 90 100 110 120 

RVYAVQTFDATAVGPILPITHERTGFEGIIGYETHFSGHGHEVHSPFDNHDSKSTSDFSG 

I I I I I I I I I I I I I : I : I i I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I 
RVYAVQTFDATAVSPVLPITHERTGFEGVIGYETHFSGHGHEVHSPFDHHDSKSTSDFSG 

70 80 90 100 110 120 

130 140 150 160 170 180 

GVDGGFTVYQLHRTGSEIHPEDGYDGPQGSDYPPPGGARDIYXXYVKGTS TKTKSNIVPR 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II : I I I I : 
GVDGGFTVYQLHRTGSEIHPEDGYDGPQGSDYPPPGGARDXYSYYVKGTSTKTKTNIVPQ 

130 140 150 160 170 180 

190 200 210 220 230 240 

APFS DRWLKENAGAAS GFFSRADEAGKL I WE S D PNKNWWANRMDD I RG I VQGAVN P FLMG 



or f 2 9a .pep 
orf29-l 

orf29-l 

or f 2 9a. pep 
orf29-l 

orf2 9a.pep 
orf29-l 
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200 



210 



220 



230 



240 



20 



250 260 270 280 290 300 

orf29a pep FQGVGIGAITDSAVSPVTDTAAQQTLQGXNHLGXLSPEAQLAAATALQDSAFAVKDGINS 

I I I I I I I ! ! I I I I I I i II I I I I I I I I I I I II I I I I M I I I I : II I I II II I I I I I I 
orf29-l FQGVGIGAITDSAVSPVTDTAAQQTLQGINDLGKLSPEAQLAAASLLQDSAFAVKDGINS 
250 260 270 280 290 300 

310 320 330 340 350 360 

orf29a pep ARQWADAHPNITATAQTALAVAXAATTVWGGKBCVELNPTKWDWVKNTGYXTPAVRTMHTL 
I : I I I I I I I I I I I I I I I! I :: I II III I I I I I I I I I I I I II I I I I I I I : I I : I I 
orf 2 9-1 AKQWADAHPNITATAQTALSAAEAAGTWRGKKVELNPTKWDWVKNTGYKKPAARHMQTL 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 2 9a . pep DGEMAGGNRPPKSITSNSKADASTQPSLQAQLIGEQIXXGHAYNKHVIRQQEFTDLNINS 

I I I I II I I : I I I : III: I 
orf 2 9-1 DGEMAGGNKPIKSLP-NSAAEKRKQNFEKFNSNWSSASFDSVHKTLTPNAPGILSPDKVK 

370 380 390 400 410 

Homology with a predicted ORF from N. gonorrhoeae 

ORF29 shows 88.8% identity over a 125aa overlap with a predicted ORF (ORF29.ng) from N. 
gonorrhoeae: 

orf 29. pep VSPVLPITHERTGFEGVIGYETHFSGHGHE 30 

I : I : I I I I I I I I I I I I I I I I I M I ! I I I I I 
orf 2 9ng EPGGKYHLFGNARGSVKNRVCAVQTFDATAVGPILPITHERTGFEGVIGYETHFSGHGHE 102 

orf 2 9 . pep VHSPFDHHDSKSTSDFSGGVDGGFTVYQLHRTWSEIHPEDEYDGPQAAXYPPPGGARDIY 90 

I I I I I I : I II I I I I I I II I I I I II I I I I I! I I I I I I M I Mill:: I I I I 

orf2 9ng VHSPFDNHDSKSTSDFSGGVDGGFTVYQLHRTGSEIHPEDGYDGPQGGGYPPPGGARDIY 162 



rf2 9.pep 

rf2 9ng 



S YYVKGT STKTKTS I VPQAP FS DRWLEENAGAASG 
I I :: I I I I I I I I : I I I I I I I I I I I : I I I I I I I I 

SYHIKGTSTKTKINTVPQAPFSDRWLKENAGAASGFLSRADEAGKLIWENDPDKNWRANR 



125 
222 



The complete length ORF29ng nucleotide sequence <SEQ ID 169> is predicted to encode a protein 
having amino acid sequence <SEQ ID 1 70>: 



1 MNLPIQKFMM LFAAAISLLQ IPISHA NGLD 



101 
151 
201 
251 
301 
351 



FGNARGSVKN 
HEVHSPFDNH 
GYPPPGGARD 
RADEAGKLIW 
DSAVSPVTYA 
ARQW ADAH PN 
KPAARHMQTV 
YHGFPQSVDA 
DGKINHRLFV 



RVCAVQTFDA 
DSKSTSDFSG 
IYSYHIKGTS 
ENDPDKNWRA 
AARKTLQGIH 
ITATAQTALA 
DGEMAGGNKP 
FSENGTVIQI 
PNQQLPEK* 



TAVGPILPIT 
GVDGGFTVYQ 
TKTKINTVPQ 
NRMDDIRGIV 
NLGNLSPEAQ 
VTEAATTVWG 
LESKNTVTTN 
VGGDNIVRHK 



ARLRDDMQAK 
HERTGFEGVI 
LHRTGSEIHP 
APFSDRWLKE 
QGAVNPFLTG 
LAAATALQDS 
GKKVELNPAK 
NFFENTGYTE 
LYIPGSYKGK 



HYEPGGKYHL 
GYETHFSGHG 
EDGYDGPQGG 
NAGAASGFLS 
FQGLGVGAIT 
AFAVKDSINS 
WDWVKNTGYK 
KVLRQASNGD 
DGNFEYIREA 



In a second experiment, the following DNA sequence <SEQ ID 17 1> was identified: 



1 


atgAATTTGC 


CTATTCAAAA 


51 


gatgctGCat 


ATCCCCATTA 


101 


GCGATGATAT 


GCAGGCAAAA 


151 


TTTGGTAATG 


CTCGCGGCAG 


201 


ATTTGATGCA 


ACTGCGGTCG 


251 


CAGGATTTGA 


AGGTGTTATC 


301 


CACGAAGTAC 


ACAGTCCGTT 


351 


TTTCAGCGGC 


GGCGTAGACG 


401 


CAGGGTCGGA 


AATACATCCC 


451 


GGTTATCCGG 


AACCACAAGG 


501 


AGGAACTTCA 


AC C AAAACAA 


551 


CAGACCGCTG 


GCTAAAAGAA 


601 


CGTGCGGATG 


AAGCAGGAAA 



ATTCATGATG 
GTCATGCGAA 
CACTACGAAC 
TGTTAAAAAT 
GCCCCATACT 
GGCTATGAAA 
CGATAATCAT 
GCGGTTTTAC 
GCAGACGGAT 
GGCAAGGGAT 
AGATAAACAC 
AATGCCGGTG 
ACTGATATGG 



ctgttggcAg 
CGGTTTGGAT 
CGGGTGGCAA 
CGGGTTTGCG 
GCCTATTACA 
CCCATTTTTC 
GATTCAAAAA 
CGTTTACCAA 
ATGACGGGCC 
ATATACAGCT 
TGTTCCGCAA 
CCGCTTCCGG 

GAAAACGACC 



cggcaatatc 
GCCCGTTTGC 
ATACCATCTG 
CCGTCCAAAC 
CACGAACGGA 
AGGACACGGA 
GCACTTCTGA 
CTTCATCGGA 
TCAAGGCGGC 
ACCATATCAA 
GCCCCTTTTT 
TTTTCTCAGC 

CCGATAAAAA 
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651 TTGGCGGGCT AACCGTATGG ATGATATTCG CGGCATCGTC CAAGGTGCGG 

701 TTAATCCTTT TTTAACGGGT TTTCAAGGGG TAGGGATTGG GGCAATTACA 

751 GACAGTGCGG TAAGCCCGGT CACAGATACA GCCGCTCAGC AGACTCTACA 

801 AGGTATTAAT GATTTAGGAA ATTTAAGTCC GGAAGCACAA CTTGCCGCCG 

851 CGAGCCTATT ACAGGACAGT GCCTTTGCGG TAAAAGACGG CATCAATTCC 

901 GCCAGACAAT GGGCTGATGC CCATCCGAAT ATAACAGCAA CAGCCCAAAC 

951 TGCCCTTGCC GTAGCAGAGG CCGCAGGTAC GGTTTGGCGC GGTAAAAAAG 

1001 TAGAACTTAA CCCGACCAAA TGGGATTGGG TTAAAAATAC CGGCTATAAA 

1051 AAACCTGCTG CCCGCCATAT GCAGACTGTA GATGGGGAGA TGGCAGGGGG 

1101 GAATAGACCG CCTAAATCTA TAACGTCGGA AGGAAAAGCT AATGCTGCAA 

1151 CCTATCCTAA GTTGGTTAAT CAGCTAAATG AGCAAAACTT AAATAACATT 

1201 GCGGCTCAAG ATCCAAGATT GAGTCTAGCT ATTCATGAGG GTAAAAAAAA 

1251 TTTTCCAATA GGAACT GCAA CTT AT GAAGA GGCAGATAGA CTAGGTAAAA 

1301 TTTGGGTTGG TGAGGGTGCA AGACAAACTA GTGGAGGCGG ATGGTTAAGT 

1351 AGAGATGGCA CTCGACAATA TCGGCCACCA ACAGAAAAAA AATCACAATT 

1401 TGCAACTACA GGTATTCAAG CAAATTTTGA AACTTATACT ATTGATTCAA 

1451 ATGAAAAAAG AAATAAAATT AAAAATGGAC ATTTAAATAT TAGGTAA 

This encodes a protein having amino acid sequence <SEQ ID 172; ORF29ng-l>: 



1 MNLPIQKFMM LLAAAISMLH IPISHA NGLD ARLRDDMQAK HYEPGGKYHL 

51 FGNARGSVKN RVCAVQTFDA TAVGPILPIT HERTGFEGVI GYETHFSGHG 

101 HEVHSPFDNH DSKSTSDFSG GVDGGFTVYQ LHRTGSEIHP ADGYDGPQGG 

151 GYPEPQGARD IYSYHIKGTS TKTKINTVPQ APFSDRWLKE NAGAASGFLS 

2 01 RADEAGKLIW ENDPDKNWRA NRMDDIRGIV QGAVNPFLTG FQGVGIGAIT 

251 DSAVSPVTDT AAQQTLQGIN DLGNLSPEAQ LAAASLLQDS AFAVKDGINS 

301 ARQWADAHPN ITATAQTALA VAEAAGTVWR GKKVELNPTK WDWVKNTGYK 

351 KPAARHMQTV DGEMAGGNRP PKSITSEGKA NAATYPKLVN QLNEQNLNNI 

4 01 AAQDPRLSLA IHEGKKNFPI GTATYEEADR LGKIWVGEGA RQTSGGGWLS 

451 RDGTRQYRPP TEKKSQFATT GIQANFETYT IDSNEKRNKI KNGHLNIR* 

ORF29ng-l and ORF29-1 show 86.0% identity in 401 aa overlap: 



10 20 30 40 50 60 

o r f 2 9ng- 1 . pep MNLPIQKFMMLLAAAI SMLH I P I SHANGLDARLRDDMQAKHYE PGGKYHLFGNARGSVKN 

I I I I I I I I I I I : II I I I : I : I I I! I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I : 

orf2 9-l MNL PIQKFMMLFAAAI SLLQ I P I SHANGLDARLRDDMQAKHYE PGGKYHLFGNARGSVKK 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf2 9ng-l.pep RVCAVQTFDATAVGPILPITHERTGFEGVIGYETHFSGHGHEVHSPFDNHDSKSTSDFSG 
II I I I I I I I I I I : I : II I I I! I I I I I I I I I I I I I I I I I I I I II I I I I : I I I I I I I I I II 
orf2 9-l RVYAVQTFDATAVSPVLPITHERTGFEGVIGYETHFSGHGHEVHSPFDHHDSKSTSDFSG 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf2 9ng-l.pep GVDGGFTVYQLHRTGSEIHPADGYDGPQGGGYPEPQGARDIYSYHIKGTSTKTKINTVPQ 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I : II I I I I I I I I I :: I I I I I I I I I III 
orf2 9-l GVDGGFTVYQLHRTGSEIHPEDGYDGPQGSDYPPPGGARDIYSYYVKGTSTKTKTNIVPQ 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf2 9ng-l.pep APFSDRWLKENAGAASGFLSRADEAGKLIWENDPDKNWRANRMDDIRGIVQGAVNPFLTG 
I I I I I I I I I M I I I I I II : I I I I I I I I I I I I : II : I I I I I I 1 I I : I I I I I I I I I I M I 

or f 2 9-1 APFSDRWLKENAGAASGFFSRADEAGKLIWESDPNKNWWANRMDDVRGIVQGAVNPFLMG 
190 200 210 220 230 240 



250 260 270 280 290 300 

orf2 9ng-l.pep FQGVGIGAIT DSAVSPVTDTAAQQTLQGINDLGNLSPEAQLAAASLLQDSAFAVKDGINS 
I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I : I I I I I II I I I ! I I II I I I I I I II I I I 
or f 2 9-1 FQGVGIGAITDSAVSPVTDTAAQQTLQGINDLGKLSPEAQLAAASLLQDSAFAVKDGINS 

250 260 270 280 290 300 



310 320 330 340 350 360 

orf2 9ng-l.pep ARQWADAHPN ITATAQTALA VAEAAGTVWRGKECVELNPTKWDWVKNTGYKKPAARHMQTV 
I : I I I I I I I I I M I I I I ! I : : I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : 
or f 2 9-1 AKQWAD AH PN I T AT AQTAL S AAE AAGT VWRGKKVE LN PTKWDWVKNT GYKKPAARHMQT L 

310 320 330 340 350 360 

370 380 390 400 410 419 
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orf29ng-l.pep DGEMAGGNRPPKSI-TSEGKANAATYPKLVNQLNEQNLNNIAAQDPRLSLAIHEGKKNFP 

| | | ] | 1 | | : [ | I : : I : : : : I : : : : : : : : : 

orf29-l DGEMAGGNKPIKSLPNSAAEKRKQNFEKFNSNWSSASFDSVHKTLTPNAPGILSPDKVKT 
370 380 390 400 410 420 

^ 420 430 440 450 460 470 479 

or f 2 9ng-l . pep IGTATYEEADRLGKIWVGEGARQTSGGGWLSRDGTRQYRPPTEKKSQFATTGIQANFETY 

orf29-l RYTSLDGKITIIKDNENNYFRIHDNSRKQYLDSNGNAVKTGNLQGKQAKDYLQQQTHIRN 
10 430 440 450 460 470 480 

Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

15 Example 21 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 173>: 

1 ATGAAAAAAC AAATCACCGC AGCCGTAATG ATGCTGTCTA TGATTGCCCC 
51 CGCAATGGCA AACGGCTTGG ACAATCAGGC ATTTGAAGAC CAAATGTTCC 
101 ACACGCGGGC AGATGCACCG ATGCAG. . . 

20 This corresponds to the amino acid sequence <SEQ ID 174; ORF30>: 

1 MKKQITAAVM MLSMIAPAMA NGLDNQAFED QMFHTRADAP MQ . . 

Further work revealed the complete nucleotide sequence <SEQ ID 175>: 

1 ATGAAAAAAC AAATCACCGC AGCCGTAATG ATGCTGTCTA TGATTGCCCC 

51 CGCAATGGCA AACGGCTTGG ACAATCAGGC ATTTGAAGAC CAAGTGTTCC 

25 101 ACACGCGGGC AGATGCACCG ATGCAGTTGG CGGAGCTTTC TCAAAAGGAG 

151 ATGAAGGAGA CAGAGGGGGC GTTTCTTCCA TTGGCTATCT TGGGTGGTGC 

2 01 TGCCATTGGT ATGTGGACAC AGCATGGTTT TAGTTATGCA ACGACAGGCA 

2 51 GACCAGCTTC TGTTAGAGAT GTTGCTATTG CTGGCGGATT AGGCGCAATT 

301 CCTGGTGGTG TAGGCGCCGC AGGAAAGGTT GTTTCCTTTG CTAAATATGG 

30 351 ACGTGAGATT AAAATCGGCA ATAATATGCG GATAGCCCCT TTCGGTAATA 

4 01 GAACAGGTCA TCCTATTGGA AAATTTCCCC AT TAT CAT CG TCGAGTTACG 

4 51 GATAATACGG GCAAGACTTT GCCTGGACAG GGAATTGGTC GTCATCGCCC 

501 TTGGGAATCA AAATCTACGG ACAGATCATG GAAAAACCGC TTCTAA 

This corresponds to the amino acid sequence <SEQ ID 176; ORF30-1>: 

35 , 1 MKKQITAAVM MLSMIAPAMA NGLDNQAFED QVFHT RADAP MQIAELSQKE 

51 MKETE GAFLP LAILGGAAIG MW TQHGFSYA TTGRPASVRD VAIAGGLGAI 

101 PGGVGAAGKV VSFAKYGREI KIGNNMRIAP FGNRTGHPIG KFPHYHRRVT 

151 DNTGKTLPGQ GIGRHRPWES KSTDRSWKNR F* 

Computer analysis of this amino acid sequence gave the following results: 
40 Homology with a predicted QRF from N meningitidis (strain A) 

ORF30 shows 97.6% identity over a 42aa overlap with an ORF (ORF30a) from strain A of TV. 
meningitidis: 

10 20 30 40 

orf 3 0 . pep MKKQITAAVMMLSMIAPAMA NGLDNQAFEDQMFHTRADAPMQ 
45 | | I | | | I I I | | | I I | I | I | | | | | | | | | | | | I : | | | | | | | | | | 

orf 30a MKKQITAAVMMLSMIAPAMA NGLDNQAFEDQVFHTRADAPMQLAELSQKEMKXTX GAFLP 
10 20 30 40 50 60 

or f 3 0 a LX I LGGAAI GMW TQHGFS YATTGRPAS VRDVAI AGGLGAI PGXVGAAGKWS FAKYGRE I 
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70 80 90 100 110 120 

The complete length ORF30a nucleotide sequence <SEQ ID 177> is: 

1 ATGAAAAAAC AAATCACCGC AGCCGTAATG ATGCTGTCTA TGATTGCCCC 

51 CGCAATGGCA AACGGCTTGG ACAAT CAGGC ATTTGAAGAC CAAGTGTTCC 

101 ACACGCGGGC AGATGCACCG ATGCAGTTGG CGGAGCTTTC TCAAAAGGAG 

151 AT GAAGG AN A CAGNGGGGGC GTTTCTTCCA TTGGNTATCT TGGGTGGTGC 

201 TGCCATTGGT AT GT GG AC AC AGCATGGTTT TAGTTAT GCA ACGACAGGCA 

251 GACCAGCTTC TGTTAGAGAT GTTGCTATTG CTGGCGGATT AGGCGCAATT 

301 CCTGGTGNTG TAGGCGCCGC AGGAAAGGTT GTTTCCTTTG CTAAATATGG 

351 ACGTGAGATT AAAATCGGCA ATAATATGCG GATAGCCCCT TTCGGTAATA 

4 01 GAACAGGTCA TCCTATTGGN AAATTTCCCC AT TAT CAT CG TCGAGTTACG 

451 GATAATACGG GCAAGACTTT GCCTGGACAG GGAATTGGTC GTCATCGCCC 

501 TTGGGAATCA AAATCTACGG AC AGAT CAT G GAAAAACCGC TTCTAA 

This encodes a protein having amino acid sequence <SEQ ID 178>: 

1 MKKQITAAVM MLSMIAPAMA NGLDNQAFED QVFHTRADAP MQLAELSQKE 

51 MKXTX GAFLP LXILGGAAIG MW TQHGFSYA TTGRPASVRD VAIAGGLGAI 

101 PGXVGAAGKV VSFAKYGREI KIGNNMRIAP FGNRTGHPIG KFPHYHRRVT 

151 DNTGKTLPGQ GIGRHRPWES KSTDRSWKNR F* 

ORF30a and ORF30-1 show 97.8% identity in 181 aa overlap: 

orf 30a .pep MKKQITAAVMMLSMIAPAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEMKXTXGAFLP 60 

I I I I I I I I I I I I [ I I I I I I I I I I I I I I I I I I I M I [ I I I I I I I I I I I I I I I I I Mill 
orf 3 0-1 MKKQITAAVMMLSMIAPAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEMKETEGAFLP 60 

orf30a.pep LXI LGGAAIGMWTOHGFS YATTGRPASVRDVAIAGGLGAI PGXVGAAGKWS FAKYGRE I 12 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I ! I M I I I I I I I I I I I I I I I I I I I I 
orf 30-1 LAI LGGAAIGMWTQHGFSYATTGRPASVRDVAIAGGLGAI PGGVGAAGKVVSFAKYGRE I 12 0 

orf 30a . pep KIGNNMRIAP FGNRTGHPIGKFPHYHRRVTDNTGKTLPGQGIGRHRPWESKSTDRSWKNR 18 0 

I I I I I I I I I I I I I M I I I I I II I I I I II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 
orf 30-1 KIGNNMRIAPFGNRTGHPIGKFPHYHRRVTDNTGKTLPGQGIGRHRPWESKSTDRSWKNR 18 0 

orf 30a. pep FX 

I I 

orf30-l FX 

Homology with a predicted ORF from N.gonorrhoeae 

ORF30 shows 97.6% identity over a 42aa overlap with a predicted ORF (ORF30.ng) from N. 
gonorrhoeae: 

orf 3 0. pep MKKQITAAVMMLSMIAPAMANGLDNQAFEDQMFHTRADAPMQ 42 

I I I I I I 1 I I I I I I I I I I I I I I I II I I I I I I I : I I I I I I I I I I 
orf30ng MKKQITAAVMMLSMIAPAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEMKETEGAFLP 60 

The complete length ORF30ng nucleotide sequence <SEQ ID 179> is 

1 ATGAAAAAAC AAATCACCGC AGCCGTAATG ATGCTGTCTA TGATCGCCCC 

51 CGCAATGGCA AACGGATTGG ACAATCAGGC ATTTGAAGAC CAAGTGTTCC 

101 ACACGCGGGC AGATGCGCCG ATGCAGTTGG CGGAGCTTTC T C AGAAGG AG 

151 ATGAAGGAGA CTGAAGGGGC TTTTCTTCCA TTGGCTATCT TGGGTGGTGC 

2 01 TGCCATTGGT AT GT GGACAC AGCATGGTTT TAGTTATGCA ACGACAGGCA 

2 51 GACCAGCTTC TGTTAGAGAT GTTGCTGGCG GATTAGGCGC AATTCCTGGT 

301 GATGTAGGTG CTGCAGGAAA GGTTGTTTCC TTTGCTAAAT ATGGACGTGA 

351 GATTAAAATC GGCAATAATA TGCGGATAGC CCCTTTCGGT AATAGAACAG 

4 01 GTCATCCTAT TGGAAAATTT CCCCATTATC ATCGTCGAGT TACGGATAAT 

451 ACGGGCAAGA CTTTGCCTGG ACAGGGAATT GGTCGTCATC GCCCTTGGGA 

501 ATCAAAATCT ACGGACAGAT CAT GGAAAAA CCGCTTCTAA 

This encodes a protein having amino acid sequence <SEQ ID 180>: 



1 MKKQITAAVM MLSMIAPAM A NGLDNQAFED QVFHTRADAP MQLAELSQKE 
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51 MKETEGAFLP LAILGGAAIG MWTQHGFSYA TTGRPASVRD VAGGLGAIPG 
101 DVGAAGKWS FAKYGREIKI GNNMRIAPFG NRTGHPIGKF PHYHRRVTDN 
151 TGKTLPGQGI GRHRPWESKS TDRSWKNRF* 

ORF30ng and ORF30-1 show 98.3% identity in 181 aa overlap: 

5 10 20 30 40 50 60 

orf30ng pep MKKQITAAVMMLSMIAPAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEMKETEGAFLP 
| | | | | | | 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I M M I I I I I 1 1 ! 
orf30-l MKKQITAAVMMLSMIAPAMANGLDNQAFEDQVFHTRADAPMQLAELSQKEMKETEGAFLP 
10 20 30 40 50 60 

10 

70 80 90 100 110 

orf30ng pep LAILGGAAIGMWTQHGFSYATTGRPASVRDVA — GGLGAIPGDVGAAGKWSFAKYGREI 

Ill M I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf30-l LAILGGAAIGMWTQHGFSYATTGRPASVRDVAIAGGLGAIPGGVGAAGKWSFAKYGREI 
15 70 80 90 100 110 120 

120 130 140 150 160 170 

orf30ng.pep KIGNNMRIAPFGNRTGHPIGKFPHYHRRVTDNTGKTLPGQGIGRHRPWESKSTDRSWKNR 

1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

20 orf30-l KIGNNMRIAPFGNRTGHPIGKFPHYHRRVTDNTGKTLPGQGIGRHRPWESKSTDRSWKNR 

130 140 150 160 170 180 

180 

orf30ng.pep FX 
25 I i 

orf30-l FX 

Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

30 Example 22 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 181>: 

1 ATGAATAAAA CTCTCTATCG TGTAATTTTC AACCGCAAAC GTGGGGCTGT 

51 GrTAGCCGTT GCTGAAACTA CCAAGCGCGA AGGTAAAAGC TGTGCCGATA 

101 GTGATTCAGG CAGCGCTCAT GTGAAATCTG TTCCTTTTGG TACTACTCAT 

35 151 GCACCTGTTT GTg.CGTTaC AAATATCTTT TCTTTTTCTT TATTGGGCTT 

2 01 TTCTTTATGT TTGGCTGTAG GtacGGyCAA TATTGCTTTT GCTGATGGCA 

251 TT . . 



40 



This corresponds to the amino acid sequence <SEQ ID 182; ORF31>: 



Further work revealed a further partial nucleotide sequence <SEQ ID 183>: 

1 ATGAATAAAA CTCTCTATCG TGTAATTTTC AACCGCAAAC GTGGGGCTGT 

51 GGTAGCCGTT GCTGAAACTA CCAAGCGCGA AGGTAAAAGC TGTGCCGATA 

101 GTGATTCAGG CAGCGCTCAT GTGAAATCTG TTCCTTTTGG TACTACTCAT 

151 GCACCTGTTT GTCGTTCAAA TATCTTTTCT TTTTCTTTAT TGGGCTTTTC 

201 TTTATGTTTG GCTGTAGGTA CGGCCAATAT TGCTTTTGCT GATGGCATT . . 

This corresponds to the amino acid sequence <SEQ ID 184; ORF31-l>: 



50 



Computer analysis of this amino acid sequence gave the following results: 
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Homology with a predicted ORF from N. gonorrhoeae 

ORF31 shows 76.2% identity over a 84aa overlap with a predicted ORF (ORF31.ng) from N. 
gonorrhoeae: 

orf 31 .pep MNKTLYRVIFNRKRGAVXAVAETTKREGKSCADSDSGSAHVKSVPFGTTHAPVCXVTNIF 60 

] | I I I I I I I I I I I I I I I I I I I I I ! I I I I i I I I I I I I : : I I I 1 I II •■■ I 

orf31ng MNKTLYRVIFNRKRGAWAVAETTKREGKSCADSGSGSVYVKSVSFIPTH SKAF 54 

orf 31. pep S FSLLGFS LCLAVGTXNI AFADGI 84 

I I 11111111:11 I I I I I I M 
orf31ng CFSALGFSLCLALGTVNIAFADGIITDKAAPKTQQATILQTGNGIPQVNIQTPTSAGVSV 114 

The complete length ORF3 lng nucleotide sequence <SEQ ID 1 85> is: 

1 AT G AACAAAA CCCTCTATCG TGTGATTTTC AACCGCAAAC GCGGTGCTGT 

51 GGTAGCTGTT GCCGAAACCA CCAAGCGCGA AGGTAAAAGC TGTGCCGATA 

101 GTGGTTCGGG CAGCGTTTAT GTGAAATCCG TTTCTTTCAT TCCTACTCAT 

151 TCCAAAGCCT TTTGTTTTTC TGCATTAGGC TTTTCTTTAT GTTTGGCTTT 

201 GGGTACGGTC AATATTGCTT TTGCTGACGG CATTATTACT GATAAAGCTG 

251 CTCCTAAAAC CCAACAAGCC ACGATTCTGC AAACAGGTaa cGGCATACCG 

301 CAAGTCAATA TTCAAACCCC TACTTCGGCA GGGGTTTCTG TTAATCAATA 

351 TGCCCAGTTT GATGTGGGTA ATCGCGGGGC GATTTTAAAC AACAGTCGCA 

401 GCAACACCCA AACACAGCTA GGCGGTTGGA TTCAAGGCAA TCCTTGGTTG 

451 ACAAGGGGCG AAGCACGTGT GGTTGTAAAC CAAATCAACA GCAGCCATCC 

501 TTCACAACTG AATGGCTATA TTGAAGTGGG TGGACGACGT GCAGAAGTCG 

551 TTATTGCCAA TCCGGCAGGG ATTGCAGTCA ATGGTGGTGG TTTTATCAAT 

601 GCTTCCCGTG CCACTTTGAC GACAGGCCAA CCGCAATATC AAGCAGGAGA 

651 CTTTAGCGGC TTTAAGATAA GGCAAGGCAA TGCTGTAATC GCCGGACACG 

7 01 GTTTGGATGC CCGTGATACC GATTTCACAC GTATTCTTGT ATGCCAACAA 

751 AATCACCTTG ATCAGTACGG CCGAACAAGC AGGCATTCGT AA 

This encodes a protein having amino acid sequence <SEQ ID 186>: 



1 MNKTLYRVIF NRKRGAWAV AETTKREGKS CADSGSGSVY VKSVSFIPTH 
51 SKAFCFSALG FSLCLALGTV N I AFADGI IT DKAAPKTQQA TILQTGNGIP 
101 QVNIQTPTSA GVSVNQYAQF DVGNRGAILN NSRSNTQTQL GGWIQGNPWL 
151 TRGEARVWN QINSSHPSQL NGYIEVGGRR AEWIANPAG IAVNGGGFIN 
201 ASRATLTTGQ PQYQAGDFSG FKIRQGNAVI AGHGLDARDT DFTRILVCQQ 
251 NHLDQYGRTS RHS* 

This gonococcal protein shares 50% identity over a 149aa overlap with the pore-forming 
hemolysins-like HecA protein from Erwinia chrysanthemi (accession number L39897): 



orf31ng 96 GNGIPQVNIQTPTSAGVSVNQYAQFDVGNRGAILNNSRSN-TQTQLGGWIQGNPWLTRGE 154 

GNG+P VNI TP ++G+S N+Y F+V NRG ILNN + T +QLGG IQ NP L 
HecA 45 GNGVPWNIATPDASGLSHNRYHDFNVDNRGLILNNGTARLTPSQLGGLIQNNPNLNGRA 104 

Orf31ng 155 ARWVNQINSSHPSQLNGYIEVGGRRAEWIANPAGIAVNGGGFINASRATLTTGQPQYQ 214 

A ++N++ S + S+L GY+EV G+ A W+ANP GI +G GF+N R TLTTG PQ+ 
HecA 105 AAAILNEWSPNRSRLAGYLEVAGQAANWVANPYGITCSGCGFLNTPRLTLTTGTPQFD 164 

Orf31ng 215 -AGDFSGFKIRQGNAVIAGHGLDARDTDF 242 

AG SG +R G+ +1 G GLDA +D+ 
HecA 165 AAGGLSGLDVRGGDILIDGAGLDASRSDY 193 

Furthermore, ORF31ng and ORF31-1 show 79.5% identity in 83 aa overlap: 



MNKTLYRVIFNRKRGAWAVAETTKREGKSCADSDSGSAHVKSVPFGTTHAPVCRSNIFS 
II I I I I I I I I I I I I I I I I II I I I II I I M I I I I I I I I : : I I II I M | : I 

MNKTLYRVIFNRKRGAWAVAETTKREGKSCADSGSGSVYVKSVSFIPTH SKAFC 



70 80 
orf31-l.pep FSLLGFS LCLAVGT AN I AFADGI 



CHIR-0160 (356.001) 



-165- 



PATENT 



I I I I I : I: ' 

orf 31ng FSALGFSLCLALGTVNIAFADGIITDKAAPKTQQATILQTGNGIPQVNIQTPTSAGVSVN 
60 70 80 90 100 110 

On this basis, including the homology with hemolysins, and also with adhesins, it is predicted that 
the proteins from N. meningitidis andN. gonorrhoeae, and their epitopes, could be useful antigens 
for vaccines or diagnostics, or for raising antibodies. 

Example 23 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 187>: 

1 ATGAATACTC CTCCTTTTGT CTGTTGGATT TTTTGCAAGG TCATCGACAA 

51 TTTCGGCGAC ATCGGCGTTT CGTGGCGGCT CGCCCGT GTT TTGCACCGCG 

101 AACTCGGTTG GCAGGTGCAT TTGTGGACGG ACGATGTGTC CGCCTTGCGT 

151 GCGCTTTGCC CTGATTTGCC CGATGTTCCC TGCGTTCATC AGGATATTCA 

2 01 TGTCCGCACT TGGCATTCCG ATGCGGCAGA TATTGATACC GCG. . 

This corresponds to the amino acid sequence <SEQ ID 188; ORF32>: 

1 MNTPPFVCWI FCKVIDNFGD IGVSWRLARV LHRELGWQVH LWTDDVSALR 
51 ALCPDLPDVP CVHQDIHVRT WHSDAADIDT A. . 

Further work revealed the complete nucleotide sequence <SEQ ID 189>: 

1 ATGAATACTC CTCCTTTTGT CTGTTGGATT TTTTGCAAGG TCATCGACAA 

51 TTTCGGCGAC ATCGGCGTTT CGTGGCGGCT CGCCCGTGTT TTGCACCGCG 

101 AACTCGGTTG GCAGGTGCAT TTGTGGACGG ACGATGTGTC CGCCTTGCGT 

151 GCGCTTTGCC CTGATTTGCC CGATGTTCCC TGCGTTCATC AGGATATTCA 

201 TGTCCGCACT TGGCATTCCG ATGCGGCAGA TATTGATACC GCGCCTGTTC 

2 51 CCGATGTCGT CATCGAAACT TTTGCCTGCG ACCTGCCCGA AAATGTGCTG 

3 01 CACATTATCC GCCGACACAA GCCGCTTTGG CTGAATTGGG AATATTTGAG 
351 CGCGGAGGAA AGCAATGAAA GGCTGCATCT GATGCCTTCG CCGCAGGAGG 

4 01 GTGTTCAAAA ATATTTTTGG TTTATGGGTT TCAGCGAAAA AAGCGGCGGG 
4 51 TTGATACGCG AACGTGAT T A CTGCGAAGCC GTCCGTTTCG ATACTGAAGC 
501 CCTGCGAGAG CGGCTGATGC TGCCCGAAAA AAACGCCTCC GAATGGCTGC 
551 TTTTCGGCTA TCGGAGCGAT GTTTGGGCAA AGTGGCTGGA AATGTGGCGA 
601 CAGGCAGGCA GCCCGATGAC ACTGTTGCTG GCGGGGACGC AAAT CAT CGA 
651 CAGCCTCAAA CAAAGCGGCG TTATTCCGCA AGATGCCCTG CAAAACGACG 
7 01 GCGATGTTTT TCAGACGGCA TCCGTCCGCC TCGTCAAAAT CCCTTTCGTG 
751 CCGCAACAGG ACTTCGACCA ACTGCTGCAC CTTGCCGACT GCGCCGTCAT 
801 CCGCGGCGAA GACAGTTTCG TGCGCGCCCA GCTTGCGGGC AAACCCTTCT 
851 TTTGGCACAT CTACCCGCAA GACGAGAATG TCCATCTCGA CAAACTCCAC 
901 GCCTTTTGGG ATAAGGCACA CGGTTTCTAC ACGCCCGAAA CCGTGTCGGC 
951 ACACCGCCGT CTTTCGGACG ACCTCAACGG CGGAGAGGCT TTATCCGCAA 

10 01 CACAACGCCT CGAATGTTGG CAAACCCTGC AACAACATCA AAACGGCTGG 
1051 CGGCAAGGCG CGGAGGATTG GAGCCGTTAT CTTTTCGGGC AGCCGTCAGC 
1101 TCCTGAAAAA CTCGCTGCCT TTGTTTCAAA GCATCAAAAA ATACGCTAG 

This corresponds to the amino acid sequence <SEQ ID 190; ORF32-l>: 



1 MNTPPFVCWI FCKVIDNFGD IGVSWRLARV LHRELGWQVH LWTDDVSALR 

51 ALCPDLPDVP CVHQDIHVRT WHSDAADIDT APVPDWIET FACDLPENVL 

101 HIIRRHKPLW LNWEYLSAEE SNERLHLMPS PQEGVQKYFW FMGFSEKSGG 

151 LIRERDYCEA VRFDTEALRE RLMLPEKNAS EWLLFGYRSD VWAKWLEMWR 

201 QAGSPMTLLL AGTQIIDSLK QSGVIPQDAL QNDGDVFQTA SVRLVKIPFV 

251 PQQDFDQLLH LADCAVIRGE DSFVRAQLAG KPFFWHIYPQ DENVHLDKLH 

301 AFWDKAHGFY TPETVSAHRR LSD DLNGGEA LSATQRLECW QTLQQHQNGW 

351 RQGAEDWSRY LFGQPSAPEK LAAFVSKHQK IR*w 



Computer analysis of this amino acid sequence gave the following results: 
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Homology with a predicted ORF from N. meningitidis (strain A) 

ORF32 shows 93.8% identity over a 81aa overlap with an ORF (ORF32a) from strain A ofN. 
meningitidis: 

10 20 30 40 50 60 

orf32 pep MNTPPFVCWIFCKVIDNFGDIGVSWRLZiRVLHRELGWQVHLWTDDVSALRALCPDLPDVP 

| I I I I I I I I [ I I I i II I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I 

orf32a MNTPPFSAGXFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDVX 



CVHQDI HVRTWH S DAAD I DT A 
I M II II II I I I I I I I II I I I 

CVHQDIHVRTWHSDAADIDTAPVXDWIETFACDLPENVLHIIRRHKPLWLXWEYLSAEX 
70 80 90 100 110 120 



The complete length ORF32a nucleotide sequence <SEQ ID 191> is: 



1 ATGAAT ACT C CTCCTTTTTC TGCTGGANTT TTTTGCAAGG TCATCGACAA 

51 TTTCGGCGAC ATCGGCGTTT CGTGGCGGCT TGCCCGTGTT TTGCACCGCG 

101 AACTCGGTTG GCAGGTGCAT TTGTGGACGG ACGATGTGTC CGCCTTGCGT 

151 GCGCTTTGCC CTGATTTGCC CGATGTTCNC TGCGTTCATC AGGATATTCA 

2 01 TGTCCGCACT TGGCATTCCG ATGCGGCAGA TATTGATACC GCGCCTGTTC 

2 51 NCGATGTCGT CATCGAAACT TTTGCCTGCG ACCTGCCCGA AAATGTGCTG 

3 01 CACATCATCC GCCGACACAA GCCGCTTTGG CTGAANTGGG AATATTTGAG 
351 CGCGGAGGAN AGCAAT GAAA GGCTGCACNT GATGCCTTCG CCGCAGGAGA 

4 01 GTGTTCNAAA ATANTTTTGG TTTATGGGTT TCAGCGAANN NAGCGGCGGA 
4 51 CTGATACGCG AACGCGATTA CTGCGAAGCC GTCCGTTTCG ATAGCGGAGC 
501 CTTGCGCAAG AGGCTGATGC TTCCCGAAAA AAACGNCCCC GAATGGCTGC 
551 TTTTCGGCTA TCGGAGCGAT GTTTGGGCAA AGTGGCTGGA AATGTGGCGA 
601 CAGGCAGGCA GTCCGTTGAC ACTTTTGCTG GCNGGGGCGC ANATTATCGA 
651 CAGCCTCAAA CAAAACGGCG TTATTCCGCA AGATGCCCTG CAAAACGACG 
7 01 GCGATGTTTT TCAGACGGCA TCCGTCCGCC TCGTCAAAAT CCCTTTCGTG 
7 51 CCGCAACAGG ACTTCGACAA ACTGCTGCAC CTTGCCGACT GCGCCGTCAT 
801 CCGCGGCGAA GACAGTTTCG TGCGCGCCCA GCTTGCGGGC AAACCCTTCT 
851 TTTGGCACAT CTACCCGCAA GATGAGAATG TCCATCTCGA CAAACTCCAC 
901 GCCTTTTGGG ATAAGGCACA CGGTTTCTAC ACGCCCGAAA CCGCATCGGC 
951 ACACCGCCGC CTTTCAGACG ACCTCAACGG CGGAGAGGCT TTATCCGCAA 

1001 CACAACGCCT CGAATGTTGG CAAATCCTGC AACAAC AT C A AAACGGCTGG 

1051 CGGCAAGGCG CGGAGGATTG GAGCCGTTAT CTTTTTGGGC AGCCTTCCGC 

1101 ATCCGAAAAA CTCGCCGCCT TTGTTTCAAA GCATCAAAAA ATACGCTAG 

This encodes a protein having amino acid sequence <SEQ ID 192>: 



1 MNTPPFSAGX FCKVIDNFGD IGVSWRLARV LHRELGWQVH LWTDDVSALR 

51 ALCPDLPDVX CVHQDI HVRT WHSDAADIDT APVXDWIET FACDLPENVL 

101 HI IRRHKPLW LXWEYLSAEX SNERLHXMPS PQESVXKXFW FMGFSEXSGG 

151 LIRERDYCEA VRFDSGALRK RLMLPEKNXP EWLLFGYRSD VWAKWLEMWR 

201 QAGSPLTLLL AGAXIIDSLK QNGVIPQDAL QNDGDVFQTA SVRLVKIPFV 

251 PQQDFDKLLH LADCAVIRGE D S FVRAQ LAG KPFFWHIYPQ DENVHLDKLH 

301 AFWDKAHGFY TPETASAHRR LSDDLNGGEA LSATQRLECW QILQQHQNGW 

351 RQGAEDWSRY LFGQPSASEK LAAFVSKHQK IR* 

ORF32a and ORF32-1 show 93.2% identity in 382 aa overlap: 



10 20 30 40 50 60 

orf 32-1 .pep MNTPPFVCWIFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDVP 

I I I I I I I I I I I I I II I I I I I I I I I I I I I II II I I I I I I I I I I II II I I I I I I I I I 

orf 32a MNTPPFSAGXFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDVX 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 32-1 . pep CVHQDIHVRTWHSDAADIDTAPVPDWIETFACDLPENVLHIIRRHKPLWLNWEYLSAEE 

I I I I I I I I I I I I I I I I I I M I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 32a CVHQDIHVRTWHSDAADIDTAPVXDWIETFACDLPENVLHIIRRHKPLWLXWEYLSAEX 

70 80 90 100 110 120 
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130 140 150 160 170 180 

r f 3 2 - 1 pep SNERLHLMPS PQEGVQKYFWFMGFSEKSGGL IRERDYCEAVRFDTEALRERLML PEKNAS 

MINI lllllhl I I I II II I I I I I I I I I II I I I II I I I : I II : I II I I I 1 1 
rf 32a SNERLHXMPSPQESVXKXFWFMGFSEXSGGLIRERDYCEAVRFDSGALRKRLMLPEKNXP 
130 140 150 160 170 180 

190 200 210 220 230 240 

rf32-l pep EWLLFGYRSDWAKWLEMWRQAGSPMTLLLAGTQIIDSLKQSGVIPQDALQNDGDVFQTA 
I I M II I I I I I I I I II I I I I I I I I I : I I I I I I : I I I I I I I : I I I I I I I I I I I I I I I I I I 
rf32a EWLLFGYRSDVWABCWLEMWRQAGSPLTLLLAGAXIIDSLKQNGVIPQDALQNDGDVFQTA 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf 32-1 . pep SVRLVKIPFVPQQDFDQLLHLADCAVIRGEDSFVRAQLAGKPFFWHIYPQDENVHLDKLH 
M I I I I I I II I I I I I I = II I I 1 I I I I I i I 1 I I I I I M I I I I I I I I I I I I I I I I I I I I I I I 
orf 32a SVRLVKIPFVPQQDFDKLLHLADCAVIRGEDSFVRAQLAGKPFFWHIYPQDENVHLDKLH 

250 260 270 280 290 300 



310 320 330 340 350 360 

orf 32-1 . pep AFWDKAHGFYTPETVSAHRRLSDDLNGGEALSATQRLECWQTLQQHQNGWRQGAEDWSRY 
I I I II I I M I I I I I : I I II I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I M 
orf 32a AFWDKAHGFYTPETASAHRRLSDDLNGGEALSATQRLECWQILQQHQNGWRQGAEDWSRY 

310 320 330 340 350 360 



370 380 
orf 32-1 .pep LFGQPSAPEKLAAFVSKHQKIRX 
II ! I I I I I I I I I I I I II I I I I I 
orf32a LFGQPSASEKLAAFVSKHQKIRX 

370 380 



Homology with a predicted ORF from N. gonorrhoeae 

ORF32 shows 95.1% identity over a 82aa overlap with a predicted ORF (ORF32.ng) from N. 
gonorrhoeae: 



orf 32 .pep MNTPPF-VCWIFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLP 57 

orf32ng MVMNTYAFPVCWIFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDV3ALRALCPDLP 60 

orf32.pep D V PCVHQ D I HVRT WH S D AAD I D T A 81 

I I I II I I I I I I I I I I I I I I I I I I 

orf32ng DVPFVHQDIHVRTWHSDAADIDTAPVPDAVIETFACDLPENVLNIIRRHKPLWLNWEYLS 12 0 

An ORF32ng nucleotide sequence <SEQ ID 193> was predicted to encode a protein having amino 
acid sequence <SEQ ID 194>: 



1 MVMNTYAFPV CWIFCKVIDN FGDIGVSWRL ARVLHRELGW QVHLWTDDVS 

51 ALRALCPDLP DVPFVHQDIH VRTWHSDAAD IDTAPVPDAV IETFACDLPE 

101 NVLNIIRRHK PLWLNWEYLS AEE SNERLHL MPSPQEGVQK YFWFMGFSEK 

151 SGGLIRERDY REAVRFDTEA LRRRLVL PEK NAPEWLL FGY RGDV WAKWLD 

201 MWQQAGS LMT LLLAGAQIID SLKQSGVIPQ NALQNEGGVF QTASVRLVKI 

251 PFVPQQDFDK LLHLADCAVI RGEDSFVRTQ LAGKPFFWHI YPQDENVHLD 

301 KLHAFWDKAY GFYTPETASV HRLLSDDLNG GEALSATQRL ECGVL* 

Further sequencing revealed the following DNA sequence <SEQ ID 195>: 



1 ATGAATACAT ACGCTTTTCC TGTCTGTTGG ATTTTTTGCA AGGT CATCGA 

51 CAATTTCGGC GACATCGGCG TTTCGTGGCG GCTCGCCCGT GTTTTGCACC 

101 GCGAACTCGG TTGGCAGGTG CATTTGTGGA CGGACGACGT GTCCGCCTTG 

151 CGCGCGCTTT GTCCCGATTT GCCCGATGTT CCCTTCGTTC AT CAGGATAT 

2 01 TCATGTCCGC ACTTGGCATT CCGATGCGGC AGACATTGAT ACCGCGCCCG 

251 TTCCCGATGC CGTTATCGAA ACTTTTGCCT GCGACCTGCC CGAAAATGTG 

301 CTGAACATCA TCCGCCGACA CAAACCGCTT TGGCTGAATT GGGAATATTT 

351 GAGCGCGGAG GAAAGCAATG AAAGGCTGCA CCTGATGCCT TCGCCGCAGG 

4 01 AGGGCGTTCA AAAATATTTT TGGTTTATGG GTTTCAGCGA AAAAAGCGGC 

451 GGGTTGATAC GCGAACGCGA TTACCGCGAA GCCGTCCGTT TCGATACCGA 
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20 



501 


AGCCCTGCGC 


CGGCGGCTGG 


TGCTGCCCGA 


AAAAAACGCC 


CCCGAATGGC 


551 


TGCTTTTCGG 


CTATCGGGGC 


GATGTTTGGG 


CAAAGTGGCT 


GGACATGTGG 


601 


CAACAGGCAG 


GCAGCCTGAT 


GACCCTACTG 


CTGGCGGGGG 


CGCAAATTAT 


651 


CGACAGCCTC 


AAACAAAGCG 


GCGTTATTCC 


GCAAAACGCC 








CTTTCagacG 


gcafcccgTcC 


gccttGTCAA 


AAtcCCGTTC 


751 


GTGCcGCAAC 


AGGAcTTCGA 


CAAATTGCTG 


CAcctcgcCG 


ACTGCGCCGT 


801 


GATACGCGGC 


GAAGACAGTT 


TCGTGCGTAC 


CCAGCTTGCC 


GGAAAACCCT 


851 


TTTTTTGGCA 


CATCTACCCG 


CAAGACGAGA 


ATGTCCATCT 


CGACAAACTC 


901 


CACGCCTTTT 


GGGATAAGGC 


ATACGGCTTC 


TACACGCCCG 


AAACCGCATC 


951 


GGTGCACCGC 


CTCCTTTCGG 


ACGACCTCAA 


CGGCGGAGAG 


GCTTTATCCG 


1001 


CAACACAACG 


CCTCGAATGT 


TGGCAAACCC 


TGCAACAACA 


TCAAAACGGC 


1051 


TGGCGGCAAG 


GCGCGGAGGA 


TTGGAGCCGT 


TATCTTTTCG 


GGCAGCCTTC 


1101 


CGCATCCGAA 


AAACTCGCCG 


CCTTTGTTTC 


AAAGCATCAA 


AAAATACGCT 


1151 


AG 










This encodes a 


protein havin 


g amino acid sequence <SEQ ID 196; ORF32ng-l 


i 


MNTYAFPVCW 


IFCKVIDNFG 


DIGVSWRLAR 


VLHRELGWQV 


HLWTDDVSAL 


51 


RALCPDLPDV 


PFVHQDIHVR 


TWHSDAADID 


TAPVPDAVIE 


TFACDLPENV 


101 


LNIIRRHKPL 


WLNWEYLSAE 


ESNERLHLMP 


SPQEGVQKYF 


WFMGFSEKSG 


151 


GLIRERDYRE 


AVRFDTEALR 


RRLVLPEKNA 


PEWLLFGYRG 


DVWAKWLDMW 


201 


QQAGSLMTLL 


LAGAQIIDSL 


KQSGVIPQNA LQNEGGVFQT 


ASVRLVKIPF 


251 


VPQQDFDKLL 


HLADCAVIRG 


EDSFVRTQLA 


GKPFFWHIYP 


QDENVHLDKL 


301 


HAFWDKAYGF 


YTPETASVHR 


LLSDDLNGGE 


ALSATQRLEC 


WQTLQQHQNG 


351 


WRQGAEDWSR 


YLFGQPSASE 


KLAAFVSKHQ 


KIR* 




ORF32ng-l and ORF32-1 show 93.5% identity in 383 aa overlap 








10 


20 


30 


40 



MNTPPF-VCWIFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDV 
II! I I I I I I I II I I I I I I I I I I I I I I I I M I I I I I I I I I I I 1 I I II I I I I I I I I I I I 
MNTYAFPVCWIFCKVIDNFGDIGVSWRLARVLHRELGWQVHLWTDDVSALRALCPDLPDV 



10 



20 



30 



50 



60 



60 



70 80 90 100 110 119 

PCVHQDIHVRTWHSDAADIDTAPVPDWIETFACDLPENVLHIIRRHKPLWLNWEYLSAE 

I I I I I I I I I I I I I I I II I I I 1 I I I I : I II I I I I II I I I II : I M I I I I I I I I I I I I I I I 
PFVHQDIHVRTWHSDAADIDTAPVPDAVIETFACDLPENVLNIIRRHKPLWLNWEYLSAE 
70 80 90 100 110 120 

120 130 140 150 160 170 179 

ESNERLHLMPSPQEGVQKYFWFMGFSEKSGGLIRERDYCEAVRFDTEALRERLMLPEKNA 
1 11 I I I I I I I I I II II I I I I I I M I I I I I I I I I I I I I I I I I I I I I 1 I ! I : I I : I I I II I 
ESNERLHLMPSPQEGVQKYFWFMGFSEKSGGLIRERDYREAVRFDTEALRRRLVLPEKNA 
130 140 150 160 170 180 

180 190 200 210 220 230 239 

SEWLLFGYRSDVWAKWLEMWRQAGSPMTLLLAGTQIIDSLKQSGVIPQDALQNDGDVFQT 

I I II I I M : I I I I I I I : I I : I I I I I I I I I II : II II I I I I I I I I I I : I I I I : I I II I 

PEWLLFGYRGDVWAKWLDMWQQAGSLMTLLLAGAQIIDSLKQSGVIPQNALQNEGGVFQT 
190 200 210 220 230 240 

240 250 260 270 280 290 299 

ASVRLVKIPFVPQQDFDQLLHLADCAVIRGEDSFVRAQLAGKPFFWHIYPQDENVHLDKL 
I I I I I I I I I I I I I I I I I : I I II I 1 I I I I I I I I II I I : I I I I I I I I I I I I I I I I I II I I I I 
ASVRLVKIPFVPQQDFDKLLHLADCAVIRGEDSFVRTQLAGKPFFWHIYPQDENVHLDKL 
250 260 270 280 290 300 

300 310 320 330 340 350 359 

HAFWDKAHGFYT PE TVSAHRRLSDDLNGGEALSATQRLECWQTLQQHQNGWRQGAEDWSR 
I I I I I I I : I I I I I I I : I : I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I M I 
HAFWDKAYGFYT PE T AS VHRLL S D DLNGGEAL S AT QRLECWQT LQQHQNGWRQGAE DWSR 
310 320 330 340 350 360 

360 370 380 

YLFGQPSAPEKLAAFVSKHQKIRX 
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On this basis, including the RGD sequence in the gonococcal protein, characteristic of adhesins, 
it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF32-1 (42kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
7 A shows the results of affinity purification of the His-fusion protein, and Figure 7B shows the 
results of expression of the GST-fusion in E.coli. Purified His-fusion protein was used to irnmunise 
mice, whose sera were used for ELISA, giving a positive result. These experiments confirm that 
ORF32-1 is a surface-exposed protein, and that it is a useful immunogen. 

Example 24 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 197>: 

1 . . TTGTTCCTGC GTGTNAAAGT GGGGCGTTTT TTCAGCAGTC CGGCGACGTG 

51 GTTTCGGGNC AAAGACCCTG TAAATCAGGC GGTGTTGCGG CTGTATNCGG 

101 ACGAGTGGCG GCA.ACTTCG GTACGTTGGA AAATAGNCGC AACGTCGCAC 

151 AGCCTGTGGC TCTGCACGCT GCTCGGAATG CTGGTGTCGG TATTGTTGCT 

2 01 GCTTTTGGTG CGGCAATATA CGTTCAACTG GGAAAGCACG CTGTTGAGCA 

2 51 ATGCCGCTTC GGTACGCGCG GTGGAAATGT TGGCATGGCT GCCGTCGAAA 

301 CTCGGTTTCC CTGTCCCCGA TGCGCGGTCG GTCATCGAAG GCCGTCTGAA 

351 CGGCAATATT GCCGATGCGC GGGCTTGGTC GGGGCTGCTG GTCGNCAGTA 

401 TCGCCTGCTA NGGCATCCTG CCGCGCCTG.. 

This corresponds to the amino acid sequence <SEQ ID 198; ORF33>: 

1 . . LFLRVKVGRF FSSPATWFRX KDPVNQAVLR LYXDEWRXTS VRWKIXATSH 
51 SLWLCTLLGM LVSVLLLLLV RQYTFNWEST LLSNAASVRA VEMLAWLPSK 
101 LGFPVPDARS VIEGRLNGNI ADARAWSGLL VXSIACXGIL PRL. . 

Further work revealed the complete nucleotide sequence <SEQ ID 199>: 

1 ATGTTGAATC CATCCCGAAA ACTGGTTGAG CTGGTCCGTA TTTTGGACGA 

51 AGGCGGTTTT ATTTTCAGCG GCGATCCCGT ACAGGCGACG GAGGCTTTGC 

101 GCCGCGTGGA CGGCAGTACG GAGGAAAAAA TCATCCGTCG GGCGGAGATG 

151 ATTGACAGGA ACCGTATGCT GCGGGAGACG TTGGAACGTG TGCGTGCGGG 

2 01 GTCGTTCTGG TTGTGGGTGG TGGCGGCGAC GTTTGCATTT TTTACCGGTT 

251 TTTCAGTCAC TTATCTTCTA ATGGACAATC AGGGTCTGAA TTTCTTTTTG 

301 GTTTTGGCGG GCGTGTTGGG CATGAATACG CTGATGCTGG CAGTATGGTT 

351 GGCAATGTTG TTCCTGCGTG TGAAAGTGGG GCGTTTTTTC AGCAGTCCGG 

4 01 CGACGTGGTT TCGGGGCAAA GACCCTGTAA ATCAGGCGGT GTTGCGGCTG 

451 TATGCGGACG AGTGGCGGCA ACCTTCGGTA CGTTGGAAAA TAGGCGCAAC 

501 GTCGCACAGC CTGTGGCTCT GCACGCTGCT CGGAATGCTG GTGTCGGTAT 

551 TGTTGCTGCT TTTGGTGCGG CAATATACGT TCAACTGGGA AAGCACGCTG 

601 TTGAGCAATG CCGCTTCGGT ACGCGCGGTG GAAATGTTGG CATGGCTGCC 

651 GTCGAAACTC GGTTTCCCTG TCCCCGATGC GCGGGCGGTC ATCGAAGGCC 

7 01 GTCTGAACGG CAATATTGCC GATGCGCGGG CTTGGTCGGG GCTGCTGGTC 

7 51 GGCAGTATCG CCTGCTACGG CATCCTGCCG CGCCTGCTGG CTTGGGTAGT 

8 01 GTGTAAAATC CTTTTGAAAA CAAGCGAAAA CGGATTGGAT TTGGAAAAGC 
851 CCTATTATCA GGCGGTCATC CGCCGCTGGC AGAACAAAAT CACCGATGCG 
901 GATACGCGTC GGGAAACCGT GTCCGCCGTT TCACCGAAAA TCATCTTGAA 
951 CGATGCGCCG AAATGGGCGG TCATGCTGGA GACCGAGTGG CAGGACGGCG 

1001 AATGGTTCGA GGGCAGGCTG GCGCAGGAAT GGCTGGATAA GGGCGTTGCC 

1051 ACCAATCGGG AACAGGTTGC CGCGCTGGAG ACAGAGCTGA AG C AGAAAC C 

1101 GGCGCAACTG CTTATCGGCG TGCGCGCCCA AACTGTGCCG GACCGCGGCG 

1151 TGTTGCGGCA GATTGTCCGA CTCTCGGAAG CGGCGCAGGG CGGCGCGGTG 
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1201 GTGCAGCTTT TGGCGGAACA GGGGCTTTCA GACGACCTTT CGGAAAAGCT 
1251 GGAACATTGG CGTAACGCGC TGGCCGAATG CGGCGCGGCG TGGCTTGAGC 
1301 CTGACAGGGC GGCGCAGGAA GGGCGTTTGA AAGACCAATA A 

This corresponds to the amino acid sequence <SEQ ID 200; ORF33-l>: 

1 MLNPSRKLVE LVRILDEGGF IFSGDPVQAT EALRRVDGST EEKIIRRAEM 

51 IDRNRMLRET LERVRAGS FW LWWAATFAF FTGFS VTYLL MDNQGLNFFL 

101 VLAGVLGMNT LMLAVW LAML FLRVKVGRFF SSPATWFRGK DPVNQAVLRL 

151 YADEWRQPSV RWKIGATSHS LW LCTLLGML VSVLLLLLV R QYTFNWESTL 

201 LSNAASVRAV EMLAWLPSKL GFPVPDARAV IEGRLNGNIA DARAWSGLLV 

251 GSIACYGILP RLLAW VVCKI LLKTSENGLD LEKPYYQAVI RRWQNKITDA 

301 DTRRETVSAV SPKIILNDAP KWAVMLETEW QDGEWFEGRL AQEWLDKGVA 

351 TNREQVAALE TELKQKPAQL LIGVRAQTVP DRGVLRQIVR LSEAAQGGAV 

401 VQLLAEQGLS DDLSEKLEHW RNALAE CGAA WLEPDRAAQE GRLKDQ* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF33 shows 90.9% identity over a 143aa overlap with an ORF (ORF33a) from strain A of N. 
meningitidis: 



10 



20 



30 



90 



L FLRVKVGRFFS S PAT WFRXKD PVNQAVLR 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
LMDNQGLNF FLVLAGVXGMNTLMLAVW LAML FLRVKVGRFFS S PAT WFRGKD PVNQAVLR 
100 110 120 130 140 



40 



50 



60 



70 



80 



90 



LYXDEWRXTSVRWKIXATSHSLW LCTLLGMLVSVLLLLLV RQYT FNWEST LLSNAASVRA 
II I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I ! I I I I I I I I I I I :::: I I I 
LYADEWRXPSVRWKIGATSHSLW LCTLLGMLVSVLLLLLV RQYT FNWEST LLGDSSSVRL 
150 160 170 180 190 200 

100 110 120 130 140 

VEMLAWLPSKLGFPVPDARSVIEGRLNGNIADARAWSG LLVXSIACXGILPRL 
I I I I I I I I : II II I I I I I I : I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 
VEMLAWLPAKLGFPVPDARAVIEGRLNGNIADARAWSG LLVGSIACYGILPRLLA WAVCK 
210 220 230 240 250 260 



orf33a ILXXTSENGLDLEKXXXXXXIRRWQNKITDADTRRETVSAVSPKIVLNDAPKWAVMLETE 
270 280 290 300 310 320 

The complete length ORF33a nucleotide sequence <SEQ ID 201> is: 



1 ATGTTGAATC CATCCCGAAA 

51 AGGCGGCTTT ATTTTCAGCG 

101 GCCGCGTGGA CGGCAGTACG 

151 ATCGACAGGA ACCGTATGCT 

2 01 GTCGTTCTGG TTGTGGGTGG 

251 TTTCAGTTAC TTATCTTCTA 

301 GTTTTGGCGG GCGTGNTGGG 

351 GGCAATGTTG TTCCTGCGCG 

4 01 CGACGTGGTT TCGGGGCAAA 

4 51 TATGCGGACG AGTGGCGGCN 

501 GTCGCACAGC CTGTGGCTCT 

551 TGTTGCTGCT TTTGGTGCGG 

601 TTGGGCGATT CGTCTTCGGT 

651 TGCGAAACTG GGTTTTCCCG 

7 01 GTCTGAACGG CAATATTGCC 

7 51 GGCAGTATCG CCTGCTACGG 
801 ATGCAAAATC CTTNTGNAAA 

8 51 NCNNNNNTCN NNCGNTCATC 
901 GATACGCGTC GGGAAACCGT 
951 CGATGCGCCG AAATGGGCGG 

1001 AATGGTTCGA GGGCAGGCTG 

1051 GCCAATCGGG AACAGGTTGC 



ACTGGTTGAG CTGGTCCGTA TTTTGGAAGA 
GCGATCCCGT GCAGGCGACG GAGGCTTTGC 
GAGGAAAAAA TCATCCGTCG GGCGAAGATG 
GCGGGAGACG TTGGAACGTG TGCGTGCGGG 
CGGCGGCGAC GTTTGCGTTT NTTACCGNTT 
ATGGACAATC AGGGTCTGAA TTTCTTTTTG 
CATGAATACG CTGATGCTGG CAGTATGGTT 
TGAAAGTGGG GCGTTTTTTC AGCAGTCCGG 
GACCCTGTCA ATCAGGCGGT GTTGCGGCTG 
ACCTT CGGTA CGTTGGAAAA TAGGCGCAAC 
GCACGCTGCT CGGAATGCTG GTGTCGGTAT 
CAATATACGT TCAACTGGGA AAGCACGCTG 
ACGGCTGGTG GAAATGTTGG CATGGCTGCC 
TGCCTGATGC GCGGGCGGTC AT CGAAGGTC 
GATGCGCGGG CTTGGTCGGG GCTGCTGGTC 
CATCCTGCCG CGCCTCTTGG CTTGGGCGGT 
CAAGCGAAAA CGGCTTGGAT TTGGAAAAGC 
CGCCGCTGGC AGAACAAAAT CACCGATGCG 
GTCCGCCGTT TCGCCGAAAA TCGTCTTGAA 
TCATGCTGGA GACCGAATGG CAGGACGGCG 
GCGCAGGAAT GGCTGGATAA GGGCGTTGCC 
CGCGCTGGAG ACAGAGCTGA AGCAGAAACC 
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1101 GGCGCAACTG CTTATCGGCG TGCGCGCCCA AACTGTGCCC GACCGCGGCG 

1151 TGTTGCGGCA GATCGTCCGA CTTTCGGAAG CGGCGCAGGG CGGCGCGGTG 

1201 GTGCANCTTT TGGCGGAACA GGGGCTTTCA GACGACCTTT CGGAAAAGCT 

1251 GGAACATTGG CGTAACGCGC TGACCGAATG CGGCGCGGCG TGGCTGGAAC 

1301 CCGACAGAGC GGCGCAGGAA GGCCGTCTGA AAACCAACGA CCGCACTTGA 

This encodes a protein having amino acid sequence <SEQ ID 202>: 

1 MLNPSRKLVE LVRILEEGGF IFSGDPVQAT EALRRVDGST EEKIIRRAKM 

51 I DRNRMLRET LERVRAGS FW LWVAAATFAF XTXFS VTYLL MDNQGLNFFL 

101 VLAGVXGMNT LMLAVW LAML FLRVKVGRFF SSPATWFRGK DPVNQAVLRL 

151 YADEWRXPSV RWKIGATSHS LW LCTLLGML VSVLLLLLV R QYTFNWESTL 

201 LGDSSSVRLV EMIjAWL P AKL GFPVPDARAV IEGRLNGNIA DARAWSG LLV 

251 GSIACYGILP RLLA WAVCKI LXXTSENGLD LEKXXXXXXI RRWQNKITDA 

301 DTRRETVSAV SPKIVLNDAP KWAVMLETEW QDGEWFEGRL AQEWLDKGVA 

351 ANREQVAALE TELKQKPAQL LIGVRAQTVP DRGVLRQIVR LSEAAQGGAV 

401 VXLLAEQGLS DDLSEKLEHW RNALTECGAA WLEPDRAAQE GRLKTNDRT* 

ORF33a and ORF33-1 show 94.1% identity in 444 aa overlap: 



10 20 30 40 50 60 

orf 33a pep MLNPSRKLVELVRILEEGGFIFSGDPVQATEALRRVDGSTEEKIIRRAKMIDRNRMLRET 

I I I I I ! M M I I I I I : I I I I M I I I I I I I I 1 I ! I I I I ! I I I 1 I I I I I I : I I I I I I M I I I 

orf 33-1 MLNPSRKLVELVRILDEGGFIFSGDPVQATEALRRVDGSTEEKIIRRAEMI DRNRMLRET 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 33a . pep LERVRAGSFWLWVAAATFAFXTXFSVTYLLMDNQGLNFFLVLAGVXGMNTLMLAVWLAML 
I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 33-1 LERVRAGSFWLWVVAATFAFFTGFSVTYLLMDNQGLNFFLVLAGVLGMNT LMLAVWLAML 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 33a . pep FLRVKVGRFFSSPATWFRGKDPVNQAVLRLYADEWRXPSVRWKIGATSHS LWLCTLLGML 

I 1 I I I II I I I I I I II II I I I I I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I i I I I I I 
orf 33-1 FLRVKVGRFFSSPATWFRGKDPVNQAVLRLYADEWRQPSVRWKIGATSHSLWLCTLLGML 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 33a . pep VSVLLLLLVRQYTFNWESTLLGDSSSVRLVEMLAWLPAKLGFPVPDARAVIEGRLNGNIA 
I I I I II M I I I I I I I I I I I I I : : : : I M I I I I II II : I II I I I I I I I I I I I I I I I I I II 

orf 33-1 VSVLLLLLVRQYTFNWESTLLSNAASVRAVEMLAWLPSKLGFPVPDARAVIEGRLNGNIA 
190 200 210 220 230 240 



250 260 270 280 290 300 

orf 33a . pep DARAWSGLLVGSIACYGILPRLLAWAVCKILXXTSENGLDLEKXXXXXXIRRWQNKITDA 
I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I i I 

orf 33-1 DARAWSGLLVGSIACYGILPRLLAWWCKILLKTSENGLDLEKPYYQAVIRRWQNKITDA 

250 260 270 280 290 300 



310 320 330 340 350 360 

orf 33a . pep DTRRETVSAVSPKIVLNDAPKWAVMLETEWQDGEWFEGRLAQEWLDKGVAANREQVAALE 

! I ! I I I I I I I I I I I : I I I I II I I 1 I I I I I I I I II I II I I I I I I I I I 1 I I I : I I M I I I I I 
orf 33-1 DTRRETVSAVSPKIILNDAPKWAVMLETEWQDGEWFEGRLAQEWLDKGVATNREQVAALE 
310 320 330 340 350 360 



370 380 390 400 410 420 

orf 33a . pep TELKQKPAQLLIGVRAQTVPDRGVLRQIVRLSEAAQGGAWXLLAEQGLSDDLSEKLEHW 
1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II II M M I I I I I 
orf 33-1 TELKQKPAQLLIGVRAQTVPDRGVLRQIVRLSEAAQGGAVVQLLAEQGLSDDLSEKLEHW 

370 380 390 400 410 420 



430 440 450 

orf 33a . pep RNALTECGAAWLEPDRAAQEGRLKTNDRTX 

1 I 1 I : I I I I M I I I I I I II I I I I I 
orf33-l RNALAECGAAWLEPDRAAQEGRLKDQX 

430 440 
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Homology with a predicted QRF from N.gonorrhoeae 

ORF33 shows 91.6% identity over a 143aa overlap with a predicted ORF (ORF33.ng) from N. 
gonorrhoeae: 

orf33 pep L FLRVKVGRFFS S PATWFRXKD PVNQAVLR 30 

1 I I I I I I I 1 I I I I I I I I I I I i I I I I I I I 
orf 33ng LMDNQGLNFFLVLAGVLGMNTLMLAVWLATLFLRVKVGRFFSSPATWFRGKGPVNQAVLR 10 0 

orf33 pep LYXDEWRXTSVRWKIXATSHSLWLCTLLGMLVSVLLLLLVRQYTFNWESTLLSNAASVRA 90 

|| |:|| I I ! I I I 11:1111 I I M I I I I I I I I I I I I I I I I I I I I I 

o r f 3 3 ng LY ADQWRQ PS VRWK I GATAHS LWLCT LLGMLVS VLLLLLVRQYT FNWE S T LL SNAASVRA 160 

orf 33 pep VEMLAWLPSKLGFPVPDARSVIEGRLNGNIADARAWSGLLVXSIACXGILPRL 14 3 

I | | | | I I I I I II I I I I M I : I I I I I I I I I I M I I I I 1 I I I I M : I MINI 
orf 33ng VEMLAWLPSKLGFPVPDARAVIEGRLNGNIADARAWSGLLVGSIVCYGILPRLLAWWCK 22 0 

An ORF33ng nucleotide sequence <SEQ ID 203> was predicted to encode a protein having amino 
acid sequence <SEQ ID 204>: 

1 MIDRDRMLRD TLERVRAGS F WLWWVASMM FTAGFS GTYL LMDNQGLNFF 

51 LVLAGVLGMN TLMLAV WLAT LFLRVKVGRF FSSPATWFRG KG PVNQAVLR 

101 LYADQWRQPS VRWK I GAT AH SLW LCTLLGM LVSVLLLLLV RQYTFNWEST 

151 LLSNAASVRA VEMLA WLPSK LGFPVPDARA VIEGRLNGNI ADARAWSGLL 

201 VGSIVCYGIL PRLLAWWCK ILLKTSENGL DLEKTYYQAV IRRWQNKITD 

251 ADTRRETVSA VSPKIVLNDA PKWALMLETE WQDGQWFEGR LAQEWLDKGV 

301 AANREQVAAL ETELKQKPAQ LLIGVRAQTV PDRGVLRQIV RLSEAAQGGA 

3 51 WQLLAEQGL SDDLSEKLEH WRNALTECGA AWLEPDRVAQ EGRLKDQ* 

Further sequence analysis revealed the following DNA sequence <SEQ ID 205>: 

1 ATGTTGaatC CATCCCgaAA ACTGgttgag ctGgTCCgtA Ttttgaataa 

51 agggggtTTT attttcagcg gcgatcctgt gcaggcgacg gaggctttgc 

101 gccgcgtgga cggcAGTACG GAggAaaaaa tcttccgtcg GGCGGAGAtg 

151 atcgACAGGg accgtatgtt gcgggACaCg TtggaacGTG TGCGTGCggg 

201 gtcgtTctgG TTATGGGTGG TggtggCAtC gATGATGTtt aCCGCCGGAT 

251 TTTCAGgcac ttatCttCTG ATGGACaatC AGGGGCtGAA TtTCTTTTTA 

301 GTTTTggcgG GAGTGTtggG CATGaatacG ctgATGCTGG CAGTATGGtt 

351 gGCAACGTTG TTCCTGCGCG TGAAAGTGGG ACGGTTTTTC AGCAGTCCGG 

401 CGACGTGGTT TCGGGGCAAA GGCCCTGTAA ATCAGGCGGT GTTGCGGCTG 

451 TATGCGGACC AGTGGCGGCA ACCTTCGGTA CGATGGAAAA TAGGCGCAAC 

501 GGCGCACAGC TTGTGGCTCT GCACGCTGCT CGGAATGCTG GTGTCGGTAT 

551 TGCTGCTGCT TTTGGTGCGG CAATATACGT TCAACTGGGA AAGCACGCTG 

601 TTGAGCAATG CCGCTTCGGT ACGCGCGGTG GAAATGTTGG CATGGCTGCC 

651 GTCGAAACTC GGTTTCCCTG TCCCCGATGC GCGGGCGGTC ATCGAAGGTC 

7 01 GTCTGAACGG CAATATTGCC GATGCGCGGG CTTGGTCGGG GCTGCTGGTC 

751 GGCAGTATCG TCTGCTACGG CATCCTGCCG CGCCTCTTGG CTTGGGTAGT 

801 GTGTAAAATC CTTTTGAAAA CAAGCGAAAA CGGattgGAT TTGGAAAAAA 

851 CCTATTATCA GGCGGTCATC CGCCGCTGGC AGAACAAAAT CACCGATGCG 

901 GATACGCGTC GGGAAACCGT GTCCGCCGTT TCGCcgaAAA TCGTCTTGAA 

951 CGATGCGCCG AAATGGGCGC TCATGCTGGA GACCGAGTGG CAGGACGGCC 

10 01 AATGGTTCGA GGGCAGGCTG GCGCAGGAAT GGCTGGATAA GGGCGTTGCC 

1051 GCCAATCGGG AACAGGTTGC CGCGCTGGAG ACAGAGCTGA AGCAGAAACC 

1101 GGCGCAACTG CTTATCGGCG TACGCGCCCA AACTGTGCCG GACCGGGGCG 

1151 TGCTGCGGCA GATTGTGCGG CTTTCGGAAG CGGCGCAGGG CGGCGCGGTG 

12 01 GTGCAGCTTT TGGCGGAACA GGGGCTTTCA GACGACCTTT CGGAAAAGCT 

1251 GGAACATTGG CGTAACGCGC TGACCGAATG CGGCGCGGCG TGGCTTGAGC 

1301 CTGACAGGGT GGCGCAGGAA GGCCGTTTGA AAGACCAATA A 

This encodes a protein having amino acid sequence <SEQ ID 206; ORF33ng-l>: 



1 MLNPSRKLVE LVRILNKGGF IFSGDPVQAT EALRRVDGST EEKI FRRAEM 

51 IDRDRMLRDT LERVRAGS FW LWVWASMMF TAGFS GTYLL MDNQGLNFFL 

101 VLAGVLGMNT LMLAV WLATL FLRVKVGRFF SSPATWFRGK GPVNQAVLRL 

151 YADQWRQPSV RWKIGATAHS LW LCTLLGML VSVLLLLLV R QYT FNWE STL 

2 01 LSNAASVRAV EMLAWLPSKL GFPVPDARAV IEGRLNGNIA DARAWSG LLV 
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251 GSI VCYGILP RLLAW VVCKI LLKTSENGLD LEKTYYQAVI RRWQNKITDA 

301 DTRRETVSAV SPKIVLNDAP KWALMLETEW QDGQWFEGRL AQEWLDKGVA 

351 ANREQVAALE TELKQKPAQL LIGVRAQTVP DRGVLRQIVR LSEAAQGGAV 

4 01 VQLLAEQGLS DDLSEKLEHW RNALTECGAA WLEPDRVAQE GRLKDQ* 

5 ORF33ng-l and ORF33-1 show 94.6% identity in 446 aa overlap: 



>rf33ng-l 



MLNPSRKLVELVRILDEGGFIFSGDPVQATEALRRVDGSTEEKIIRRAEMIDRNRMLRET 

| | | | | | | | | I I I I II :: I II I I I I I I I I I 1 I I I I : I I I I I I I I : I I I I = I 

MLNPSRKLVELVRILNKGGFIFSGDPVQATEALRRVDGSTEEKIFRRAEMIDRDRMLRDT 



10 



20 



30 



40 



50 



60 



70 80 90 100 110 120 

LERVRAGSFWLWWAATFAFFTGFSVTYLLMDNQGLNFFLVLAGVLGMNTLMLAVWLAML 
| | | 1 | | | | | i | [ I I : I : : I : I I 1 ! I I I I I I I I I II I M II I I I I I I I I I I I I I I I I 
LERVRAGSFWLWVWASMMFTAGFSGTYLLMDNQGLNFFLVLAGVLGMNTLMLAVWLATL 

70 80 90 100 110 120 

130 140 150 160 170 180 

FLRVKVGRFFSSPATWFRGKDPVNQAVLRLYADEWRQPSVRWKIGATSHSLWLCTLLGML 
| || || II I I I I I I I I I I I M I I I I I I I I I I I I = I I I I I I I I I I I I I : I I I I I I M M I I 
FLRVKVGRFFSSPATWFRGKGPVNQAVLRLYADQWRQPSVRWKIGATAHSLWLCTLLGML 

130 140 150 160 170 180 

190 200 210 220 230 240 

VSVLLLLLVRQYTFNWESTLLSNAASVRAVEMLAWLPSKLGFPVPDARAVIEGRLNGNIA 

I I I I I IN I I I I I I I I I I I I I I I I I I I I I I I I I I 

VSVLLLLLVRQYTFNWESTLLSNAASVRAVEMLAWLPSKLGFPVPDARAVIEGRLNGNIA 
190 200 210 220 230 240 

250 260 270 280 290 300 

DARAWSGLLVGSIACYGILPRLLAWVVCKILLKTSENGLDLEKPYYQAVI RRWQNKITDA 
j M | M I I I I I I I : II I I I I I I M I I I I I I I II I I I I I I I I I I I I I M I II I I I I I I II 
DARAWSGLLVGSIVCYGILPRLLAWVVCKILLKTSENGLDLEKTYYQAVIRRWQNKITDA 

250 260 270 280 290 300 

310 320 330 340 350 360 

DTRRETVSAVSPKIILNDAPKWAVMLETEWQDGEWFEGRLAQEWLDKGVATNREQVAALE 
I I I I I I I I I I I II I : I I M I I I I : I I I I I I I I I : I I I I M II M I I I I I I : I I I I I I I I I 
DTRRETVSAVSPKIVLNDAPKWALMLETEWQDGQWFEGRLAQEWLDKGVAANREQVAALE 

310 320 330 340 350 360 

370 380 390 400 410 420 

TELKQKPAQLLIGVRAQTVPDRGVLRQIVRLSEAAQGGAWQLLAEQGLS DDLSEKLEHW 

TELKQKPAQLLIGVRAQTVPDRGVLRQIVRLSEAAQGGAWQLLAEQGLSDDLSEKLEHW 
370 380 390 400 410 420 



430 



440 



RNALAECGAAWLE P DRAAQE GRLKDQX 

RNALTECGAAWLEPDRVAQEGRLKDQX 
430 440 



Based on the presence of several putative transmembrane domains in the gonococcal protein, it is 
55 predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 25 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 207>: 

1 . . CAGAAGAGTT TGTCGAGAAT TTCTTTATGG GGTTTGGGCG GCGTGTTTTT 
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51 CGGGGTGTCC GGTCTGGTAT GGTTTTCTTT GGGCGTTTCT TT.GAGTGCG 

101 CCTGTTTTTC GGGTGTTTCT TTTCGGGGTT CGGGACGGGG GACGTTTGTG 

151 GGCAGTACGG GGGTTTCTTT GAGTGTGTTT TCAGCTTGTG TTCC.GGCGT 

201 CGTCCGGCTG CCTGTCGGTT TGAGCTGTGT CGGCAGGTTG CG..GTTTGA 

251 CCCGGTTTTT CTTGGGTGCG GCAGGGGACG TCATTCTCCT GCCGCTTTCG 

301 TCTGTGCCGT CCGGCTGTGC GGGTTCGGAT GAGGCGGCGT GGTGGTGTTC 

351 GGGTTGGGCG GCATCTTGTT CCGACTACGC CGTTTGGCAG CCAGAATTCG 

401 GTTTCGCGGG GGCTGTCGGT GTGTTGCGGT TCGGCTTGAA GGGTTTTGTC 

451 GTCC. 

This corresponds to the amino acid sequence <SEQ ID 208; ORF34>: 

1 ..QKSLSRISLW GLGGVFFGVS GLVWFSLGVS XECACFSGVS FRGSGRGTFV 

51 GSTGVSLSVF SACVXGVVRL PVGLSCVGRL XXLTR FFLGA AGDVILLPLS 

101 SVPSGCAGSD EAAWWCSGWA ASCPTTPFGS QNSVSRGLSV CCGSA*RVLS 

151 S.. 

Further work revealed the complete nucleotide sequence <SEQ ID 209>: 

1 ATGATGATGC CGTTCATAAT GCTTCCTTGG ATTGCkGGTG TGCCTGCCGT 

51 GCCGGGTCAG AATAGGTTGT CCAGAATTTC TTTATGGGGT TTGGGCGGCG 

101 TGTTTTTCGG GGTGTCCGGT TTGGTATGGT TTTCTTTGGG CGTTTCTTTG 

151 GGCTGCGCCT GTTTTTCGGG TGTTTCTTTT CGGGGTTCGG GACGGGGGAC 

201 GTTTGTGGGC AGTACGGGGG TTTCTTTGAG TGTGTTTTCA GCTTGTGTTC 

251 CGGCGTCGTC CGGCTGCCTG TCGGTTTGAG CTGTGTCGGC AGGTTGCGGT 

3 01 TTGACCCGGT TTTTCTTGGG TGCGGCAGGG GACGGCAGTC CGCTGCCGCT 
351 TTCGTCTGTG CCGTCCGGCT GTGCGGGTTC GGATGAGGCG GCGTGGTGGT 

4 01 GTTCGGGTTG GGCGGCATCT TGTCCGACTA CGCCGTTTGG CAGCCAGAAT 
4 51 TCGGTTTCGC GGGGGCTGTC GGTGTGTTGC GGTTCGGCTT GAAGGGTTTT 
501 GTCGCCGTTC GGGTTGAATG TGCTGACGAT GCCTATTGCC AATGCGCCGA 
551 TGGCGGCGAT ACAGATGAGC AATACGGCGC GTATCAGGAG TTTGGGGGTC 
601 AGCCTGAAGG GTTTGTTCGG TTTTTTTGCC ATTTTGATTG TGCTTTTGGG 
651 GTGTCGGGCA ATGCCGTCTG AAGGCGGTTC AGACGGCATT GCCGAGTCAG 
7 01 CGTTGGACGT AGTTTTGGTA GAGGGT GAT G ACTTTTTGTA CGCCGACGGT 

7 51 GGTGCTGACT TTTTGGGTAA TCTGCGCCTG TTCTTCGGGG GTGAGGATGC 

8 01 CCATAACGTA GGTTACGTTG CCGTAGGTAA CGATTTTGAC GCGCGCCTGT 
8 51 GTGGCGGGGC TGATGCCCAA CAGCGTGGCG CGGACTTTGG ATGTGTTCCA 
901 AGTGTCGCCG GCGATGTCGC CGGCAGTGCG CGGCAGGGAG GCGACGGTAA 
951 TATAGTTGTA CACGCCTTCG GCGGCCTGTT CGGAACGTGC AATCTGACCG 

1001 ACGAACTGTT TTTCGCCTTC GGTGGCGACT TGTCCGAGCA GCAGCAGGTG 

1051 GCGGTTGTAG CCGACGACGG AGATTTGGGG CGTGTAGCCT TTGGTTTGGT 

1101 TGTTTTGGCG CAGATAGGAA CGGGCGGTGG TTTCGATACG CAACGCCATA 

1151 ACGTTGTCGT CGGTTTGCGC GCCGGTGGTT CGGCGGTCGA CGGCGGATTT 

1201 CGCGCCGACG GCGGCGCTTC CGATTACTGC GCTGACGCAG CCGCTAAGGG 

1251 CAAGGCTGAA AATGGCGGCA ATCAGGGTGC GGACGGTGTG CGGTTTGGGT 

1301 TTCATCGGGT GCTTCCTTTC TTGGGCGTTT CAGACGGCAT TGCTTTGCGC 

13 51 CATGCCGTCT GA 

This corresponds to the amino acid sequence <SEQ ID 210; ORF34-l>: 

1 MMMPFIMLPW IAGVPA VPGQ NRLSR ISLWG LGGVFFGVSG LVW FSLGVSL 

51 GCACFSGVSF RGSGRGTFVG STGVSLSVFS ACVPASSGCL SV*AVSAGCG 

101 LTRFFLGAAG DGSPLPLSSV PSGCAGSDEA AWWCSGWAAS CPTTPFGSQN 

151 SVSRGLSVCC GSA*RVLSPF GLNVLTMPIA NAPMAAIQMS NTARIRSLGV 

201 SLKGLFGFFA ILIVLL GCRA MPSEGGSDGI AESALDWLV EGDDFLYADG 

251 GAD FLGNLRL FFGGE DAHNV GYVAVGNDFD ARLCGGADAQ QRGADFGCVP 

301 SVAGDVAGSA RQGGDGNIW HAFGGLFGTC NLTDELFFAF GGDLSEQQQV 

351 AWADDGDLG R VAFGLWLA QIGTGGGF DT QRHNVWGLR AGGSAVDGGF 

401 RADGGASDYC ADAAAKGKAE NGGNQGADGV RFGFHRVLPF LGVSDGIALR 

451 HAV* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted QRF from N. meningitidis (strain A) 



ORF34 shows 73.3% identity over a 161aa overlap with an ORF (ORF34a) from strain A of N. 
meningitidis: 
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orf34 pep QKSLSR ISLWGLGGVFFGVSGLVW FSLG VSXE CAC 

I I 1 I I I I i i I ! I 1 I I I I I M 1 I 1 I I I 1 I Ml 
orf34a M MXPXIMLPWIAGVPA VPGQKRLSR XSLWGLGGXFFGVSGLV WFSLG VSXSLGVSXGCAC 
10 20 30 40 50 60 

40 50 60 70 80 90 

orf34 pep FSGV SFRGSGRG TFVGSTGVSLSVFSACV XGWRLPVGLSCVGRLXX LTRFFLGA 

I I I I I I I I I I i I ! I I I I I I I I I I I I I I I : I :: : I : : I I I I I I 

o r f 3 4 a FS GV S FRG SGRG T FVG S T G V S L SVFSACA PAS SGCLSVXAVS AGCGLTRXFXGA 

70 80 90 100 110 

100 110 120 130 140 150 

orf34 pep AGDVILLPLSSVPSGCAGSDEAAWWCSGWAASCPTTPFGSQNSVSRGLSVCCGSAXRVLS 
Ml | I I I I ] I I I I I I : I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I = I 1 I I 
orf34a AGDGSPLPLSSVPSGCAGADEEAXXCSGWAASCPTTPFGSQNSVSRGLSVCCGSVWRVLS 
120 130 140 150 160 170 

orf34.pep S 

or f 3 4a PFGXNVLTMPIANAPMAVIQMSNTARIRSL GVSLKGLFXFFAILIVLL GCRAMPSEGGSD 
180 190 200 210 220 230 

The complete length ORF34a nucleotide sequence <SEQ ID 21 1> is: 

1 ATGATGATNC CGTTNATAAT GCTTCCTTGG ATTGCGGGTG TGCCTGCCGT 

51 GCCGGGTCAG AAGAGGTTGT CGAGAANTTC TTTATGGGGT TTAGGCGGCN 

101 TGTTTTTCGG GGTGTCCGGT TTGGTATGGT TTTCTTTGGG CGTTTCTNTT 

151 TCTTTGGGTG TTTCTNTGGG CTGTGCCTGT TTTTCGGGTG TTTCTTTTCG 

201 GGGTTCGGGA CGGGGGACGT TTGTGGGCAG TACNGGGGTT TCTTTGAGTG 

251 TGTTTTCAGC TTGTGCTCCG GCGTCGTCCG GCTGCCTGTC GGTTTNAGCT 

301 GTGTCGGCAG GTTGCGGTTT GACCCGGNTT TTCTTNGGTG CGGCAGGGGA 

351 CGGCAGTCCG CTGCCGCTTT CGTCTGTGCC GTCCGGCTGT GCGGGTGCGG 

4 01 ATGAGGAGGC GTNGTNGTGT TCGGGTTGGG CGGCATCTTG TCCGACTACG 

451 CCGTTTGGCA GCCAGAATTC GGTTTCGCGG GGGCTGTCGG TGTGTTGCGG 

501 TTCGGTNTGG AGGGTTTTGT CNCCGTTCGG GTNGAATGTG CTGACGATGC 

551 CTATTGCCAA TGCGCCGATG GCGGTGATAC AGATGAGCAA TACGGCGCGT 

6 01 ATCAGGAGTT TGGGGGTCAG CCTGAAGGGT TTGTTCNGTT TTTTTGCCAT 
651 TTTGATTGTG CTTTTGGGGT GTCGGGCAAT GCCGTCTGAA GGCGGTTCAG 

7 01 ACGGCATTGC CGAGTCAGCG TTGGACGTAG TTTNGGTAGA GGGTGATGAC 

7 51 TTTTTGTACG CCGACGGTGG TGCTGACTTT TTGGGTAATC TGCGCCTGTT 

8 01 CTTCGGGGGT GAGGATGCCC ATAACGTAGG TTACGTTGCC GTAGGTAACG 
851 ATTTTGACGC GCGCCTGTGT GGCGGGGCTG ATGCCCAACA GCGTGGCGCG 
901 GACTTTGGAT GTGTTCCAAG TGTCGCCGGC GATGTCGCCG GCAGTGCGCG 
951 GCAGGGAGGC GACGGTAATG TANTTGTACA CGCCTTCGGC GGCCTGTTCG 

10 01 GAACGTGCAA TCTGACCGAC GAACTGTTTC TCGCCTTCGG TGGCGACTTG 

1051 TCCGAGCAGC AGCAGGTGGC GGTTGTAGCC GACAACGGAG ATTTGGGGCG 

1101 TGTANCCTTT GGTTTGGTTG TTTTGGCGCA GATAGGAGCG GGCGGTGGTT 

1151 TCGATACGCA GCGCCATTAC GTTGTCGTCG GTTNGCGCGC CGGTGGTTCG 

12 01 GCGGTCGACG GCGGATTTCG CGCCGACCGC CGCGCCGCCG ACGACTGCGC 

1251 TGACGCAGCC GCCGAGGGCA AGGCTGAGGA CGGCGGCAGT CAGGGTGCGG 

1301 ACGGTGTGCG GTTTGGGTTT CATCGGGTGC TTCCTTTCTT GGGCGTTTCA 

1351 GACGGCATTG CTTTGCGCCA TGCCGTCTGA 

This encodes a protein having amino acid sequence <SEQ ID 212>: 

1 MMXPXIMLPW IAGVPA VPGQ KRLS RXSLWG LGGXFFGVSG LVW FSLG VSX 

51 SLGVSXGCAC FSGV SFRGSG RG TFVGSTGV SLSVFSACA P ASSGCLSVXA 

101 VSAGCGLTRX FXGAAGDGSP LPLSSVPSGC AGADEEAXXC SGWAASCPTT 

151 PFGSQNSVSR GLSVCCGSVW RVLSPFGXNV LTMPIANAPM AVIQMSNTAR 

201 IRSL GVSLKG LFXFFAILIV LL GCRAMPSE GGSDGIAESA LDWXVEGDD 

251 FLYADGGADF LGNLRLFFGG EDAHNVGYVA VGNDFDARLC GGADAQQRGA 

301 DFGCVPSVAG DVAGSARQGG DGNVXVHAFG GLFGTCNLTD ELFLAFGGDL 

351 SEQQQVAWA DNGDLGR VXF GLWLAQIGA GGGF DTQRHY VWGXRAGGS 

4 01 AVDGGFRADR RAADDCADAA AEGKAEDGGS QGADGVRFGF HRVLPFLGVS 

4 51 DGIALRHAV* 

ORF34a and ORF34-1 show 91.3% identity in 459 aa overlap: 
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MMXPXIMLPWIAGVPAVPGQKRLSRXSLWGLGGXFFGVSGLWFSLGVSXSLGVSXGCAC 
I I I i I I M I II M II I I I : I I I I I M I I II I I M I I I I II M I I I I I I I 

MMMPFIMLPWIAGVPAVPGQNRLSRISLWGLGGVFFGVSGLVWFSLGVSL GCAC 



10 



20 



30 



40 



50 



70 80 90 100 110 120 

FSGVSFRGSGRGTFVGSTGVSLSVFSACAPASSGCLSVXAVSAGCGLTRXFXGAAGDGSP 
I I I I I I II I I I M I I I I I I I I I I I II I I : I II I I II I I I I I I I I II II I I II I 1 I I I I 
FSGVSFRGSGRGTFVGSTGVSLSVFSACVPASSGCLSVXAVSAGCGLTRFFLGAAGDGSP 
60 70 80 90 100 110 

130 140 150 160 170 180 

LPLSSVPSGCAGADEEAXXCSGWAASCPTTPFGSQNSVSRGLSVCCGSVWRVLSPFGXNV 



190 200 210 220 230 240 

LTMPIANAPMAVIQMSNTARIRSLGVSLKGLFXFFAILIVLLGCRAMPSEGGSDGIAESA 
I I I I I I I I I II : I I I I II I I I M II I I I I II I I I I I I I I I I I I M M M I I I I I I M I I 
LTMPIANAPMAAIQMSNTARIRSLGVSLKGLFGFFAILIVLLGCRAMPSEGGSDGIAESA 
180 190 200 210 220 230 

250 260 270 280 290 300 

LDWXVEGDDFLYADGGADFLGNLRLFFGGEDAHNVGYVAVGNDFDARLCGGADAQQRGA 
I I I I I I I I I I I I I I I I M I I I I i I I I I I I I II I I I I I I H I I I I I I | | I | | M I I I I I I 
LDVVLVEGDDFLYADGGADFLGNLRLFFGGEDAHNVGYVAVGNDFDARLCGGADAQQRGA 
240 250 260 270 280 290 

310 320 330 340 350 360 

DFGCVPSVAGDVAGSARQGGDGNVXVHAFGGLFGTCNLTDELFLAFGGDLSEQQQVAWA 
1 M I I I I I II I I I I I I I I I I I II : I I 1 I I I I I I I I I I ] ] I M : II II I I I I II I I I I I I 
DFGCVPSVAGDVAGSARQGGDGNIVVHAFGGLFGTCNLTDELFFAFGGDLSEQQQVAWA 
300 310 320 330 340 350 

370 380 390 400 410 420 

DNGDLGRVXFGLWLAQIGAGGGFDTQRHYVWGXRAGGSAVDGGFRADRRAADDCADAA 
I : I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I M | |:| Mill 
DDGDLGRVAFGLWLAQIGTGGGFDTQRHNVWGLRAGGSAVDGGFRADGGASDYCADAA 

360 370 380 390 400 410 



430 440 450 460 

AEGKAEDGGSQGADGVRFGFHRVLPFLGVSDGIALRHAVX 



Homology with a predicted ORF from N. gonorrhoeae 

ORF34 shows 77.6% identity over a 161aa overlap with a predicted ORF (ORF34.ng) from TV. 
gonorrhoeae: 



65 



QKSLSRISLWGLGGVFFGVSGLVWFSLGVSXE CAC 

II I I I I I I M I : I I I I I I I M M I I I I M III 
MMMPFIMLPWIAGVPAVPGQKRLSRISLWGLAGVFFGVSGLVWFSLGVSFSLGVSLGCAC 

FSGVSFRGSGRGTFVGSTGVSLSVFSACVXGWRLPVGLSCV- 



FSGVSFRGSGWGAFVGSTGVSLSVFSACVP-- 



■GRLXXLTRFFLGA 90 
: I I : I = II I I I I I I II 

VPVNESAARAASEGR — GLTRFFLGA 114 



orf3<3 .pep AGDVILLPLSSVPSGCAGSDEAAWWCSGWAASCPTTPFGSQNSVSRGLSVCCGSAXRVLS 150 

HI M I M II I I I I II I I ! I I I I II I I I I I II : I I I [ I I I I I I I I I I | | | | : MM 

orf34ng AGDGSPLPLSSVPSGCAGSDEAAWWCSGWAASCPTAPFGSQNSVSRGLSVCCGSVWRVLS 174 

orf34.pep S -j_ 75 

orf34ng PFGLNVLTMPTANAPMAVIQMSNTARIRSLGVSLKGLFGFFAILIVLLGCRAMPSEGGSD 234 



CHIR-0160 (356.001) 



-177- 



PATENT 



The complete length ORF34ng nucleotide sequence <SEQ ID 21 3> is: 

1 AT GAT GAT GC CGTT CAT AAT GCTTCCTTGG ATTGCGGGTG TGCCTGCCGT 

51 GCCGGGTCAA AAGAGGTTGT CGAGAATCTC TTTATGGGGT TTGGCCGGCG 

101 TGTTTTTCGG GGTGTCCGGT TTGGTATGGT TTTCTTTGGG CGTTTCTTTT 

151 TCTTTGGGTG TTTCTTTGGG CTGCGCCTGT TTTTCGGGTG TTTCTTTTCG 

201 GGGTTCGGGA TGGGGGGCGT TTGTGGGCAG TACGGGGGTT TCTTTGAGTG 

251 TGTTTTCAGC TTGTGTTCCG GTGCCGGTTA ACGAATCGGC TGCCCGGGCC 

301 GCATCCGAAG GGCGCGGTTT gACCCGGTTT TTCTTGGGTG CGGCAGGGGA 

351 CGGCAGTCCG CTGCCGCTTT CTTCTGTGCC GTCCGGCTGT GCGGGTTCGG 

401 ATGAGGCGGC GTGGTGGTGT TCGGGTTGGG CGGCATCTTG TCCGACGGCG 

451 CCGTTTGGCA GCCAGAATTC GGTTTCGCGG GGGCTGTCGG TGTGTTGCGG 

501 TTCGGTTTGG AGGGTTTTGT CGCCGTTCGG GTTGAATGTG CTGACGATGC 

551 CTACTGCCAA TGCGCCGATG GCGGTGATAC AGATGAGCAA TACGGCGCGT 

601 ATCAGGAGTT TGGGGGT CAG CCTGAAGGGT TTGTTCGGTT TTTTTGCCAT 

651 TTTGATTGTG CTTTTGGGGT GTCGGGCAAT GCCGTCTGAA GGCGGTTCAG 

701 ACGGCATTGC CGAGTCAGCG TTGGACGTAG TTTTGGTAGA GGGTAATGAC 

751 TTTTTGTACG CCGAcggTGG TGCTGACTTT TTGGGTAATC TGCGCCT GTT 

801 CTTCGGGGGT GAGGATGCCC ATAACGTAGG TTACATTGCC GTAGGTAATG 

851 ATTTTGACGC GCGCCTGTGT AGCGGGGCTG ATGCCCAGCA GcgtgGCGCG 

901 GACTTTGGAC GTGTTCCAAG TGTCGCCGGC GATGTCGCCC GCAGTGCGCG 

951 GCAGGGAGGC GACGGTAATG TAGTTGTATA CGCCTTCGGC GGCCTGTTCG 

1001 GAACGTGCAA TCTGACCGAC GAACTGTTTT TCGCCTTCGG TGGCGACTTG 

1051 TCCGAGCAGC AGCAGGTGGC GGTTGTAGCC GACGACGGAG ATTTGGGGCG 

1101 TGTAGCCTTT GGTTTGGTTG TTTTGGCGCA GGTAGGAACG GGCGGTGGTT 

1151 TCGATACGCA ACGCCATAAC GTtgtCATCG GTTtgcgcgc CGGTGGTTcg 

1201 gCGGTCGATG ACGGATTTTG CGCCGACGGC GGCCCCGCCG ACGACTGCGC 

1251 TGAAGCAGCC GCCGAGGGCA AGGCTGAGGA CGGCGGCAAT CAGGGTGCGG 

1301 ACGGTGTGTG GTTTGGGTTT CATCGGGGAC TTCCTTTCTT GGGCGTTTCA 

1351 GACGGCATTG CTTTGCGCCA TGCCGTCTGA 

This encodes a protein having amino acid sequence <SEQ ID 214>: 

1 MMMPFIMLPW IAGVPAV PGQ KRLSR ISLWG LAGVFFGVSG LVW FSLGVSF 

51 SLGVSLGCAC FSGV SFRGSG WG AFVGSTGV SLSVFSACV P VPVNESAARA 

101 ASEGRGLTRF FLGAAGDGSP LPLSSVPSGC AGSDEAAWWC SGWAASCPTA 

151 PFGSQNSVSR GLSVCCGSVW RVLSPFGLNV LTMPTANAPM AVIQMSNTAR 

2 01 IRSLG VSLKG LFGFFAILIV LL GCRAMPSE GGSDGIAESA LDWLVEGND 

2 51 FLYADGGADF LGNLRLFFGG EDAHNVGYIA VGNDFDARLC SGADAQQRGA 

301 DFGRVPSVAG DVARSARQGG DGNVWYAFG GLFGTCNLTD ELFFAFGGDL 

351 SEQQQVAWA DDGDLGR VAF GLWLAQVGT GGGF DTQRHN WIGLRAGGS 

4 01 AVDDGFCADG G PAD D CAE AA AEGKAEDGGN QGADGVWFGF HRGLPFLGVS 

451 DGIALRHAV* 

ORF34ng and ORF34-1 show 90.0% identity in 459 aa overlap: 

10 20 30 40 4 50 
MMMPFIMLPWIAGVPAVPGQNRLSRISLWGLGGVFFGVSGLVWFSLGVS LGCAC 

I I I I I I I I I M I I I I I ! I I I : I I I I I I 1 I I I : I I I I I I I I I I II I I I I I I I I I I 
MMMPFIMLPWIAGVPAVPGQKRLSRISLWGLAGVFFGVSGLVWFSLGVSFSLGVS LGCAC 

10 20 30 40 50 60 

60 70 80 90 100 110 

FSGVSFRGSGRGTFVGSTGVSLSVFSACVPASSGCLSVXAVSAGCGLTRFFLGAAGDGSP 

II I I I I I I I I I : II I I I I I I I I I II I I I I = : : : I : i I M I I I I I I I I I I I I I 
FSGVSFRGSGWGAFVGSTGVSLSVFSACVPVPVNESAARAASEGRGLTRFFLGAAGDGSP 

70 80 90 100 110 120 

120 130 140 150 160 170 

LPLSSVPSGCAGSDEAAWWCSGWAASCPTTPFGSQNSVSRGLSVCCGSAXRVLSPFGLNV 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I II II I I : I I I I I I I I I I 
LPLSSVPSGCAGSDEAAWWCSGWAASCPTAPFGSQNSVSRGLSVCCGSVWRVLSPFGLNV 
130 140 150 160 170 180 

180 190 200 210 220 230 

LTMPIANAPMAAIQMSNTARIRSLGVSLKGLFGFFAILIVLLGCRAMPSEGGSDGIAESA 
I I I I I I I I I I : ! I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I ! I I II I I I I I I I I I I 
LTMPTANAPMAVIQMSNTARIRSLGVSLKGLFGFFAILIVLLGCRAMPSEGGSDGIAESA 
190 200 210 220 230 240 



orf 34-1 .pep 
orf 34ng 

orf 34-1 .pep 
orf34ng 

orf 34-1 .pep 
orf34ng 

orf34-l .pep 
orf34ng 
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240 250 260 270 280 290 

orf 34-1 . pep LDWLVEGDDFLYADGGADFLGNLRLFFGGEDAHNVGYVAVGNDFDARLCGGADAQQRGA 

I I I I I I I I : I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I : I I I I I I I I I I I : I I I I I I I I I 
orf34ng LDWLVEGNDFLYADGGADFLGNLRLFFGGEDAHNVGYIAVGNDFDARLCSGADAQQRGA 
250 260 270 280 290 300 



300 310 320 330 340 350 

orf 3 4-1 .pep DFGCVPSVAGDVAGSARQGGDGNIWHAFGGLFGTCNLTDELFFAFGGDLSEQQQVAWA 

I I I I M M M M I I I I II M I : I I : I I I I I I I I I I I I I I I I I I I I I I I H I I I I I II I 
orf34ng DFGRVPSVAGDVARSARQGGDGNVWYAFGGLFGTCNLTDELFFAFGGDLSEQQQVAWA 
310 320 330 340 350 360 



360 370 380 390 400 410 

orf 3 4-1 .pep DDGDLGRVAFGLWLAQIGTGGGFDTQRHNVWGLRAGGSAVDGGFRADGGASDYCADAA 
1 I 1 I I I I II I M I II II : I I M I II II I I I II : II I I I M II I II I I I I : I I I : II 
orf34ng DDGDLGRVAFGLWLAQVGTGGGFDTQRHNWIGLRAGGSAVDDGFCADGGPADDCAEAA 
370 380 390 400 410 420 



420 430 440 450 

orf 34-1 .pep AKGKAENGGN QGADGVRFGFHRVLP FLGVS DGI ALRHAVX 

I : II I I : I I I I I I M I I I I I I I M I I I I I I I I I I II I I 
orf34ng AEGKAEDGGNQGADGVWFGFHRGLPFLGVSDGIALRHAVX 
430 440 450 460 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 26 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 215>: 

1 ATGAAAACCT TCTTCAAAAC CCTTTCCGCC GCCGCACTCG CGCTCATCCT 

51 CGCCGCCTGC GGATT . CAAA AAGACAGCGC GCCCGCCGCA TCCGCTTCTG 

101 CCGCCGCCGA CAACGGCGCG GCGTAAAAAA GAAATCGTCT TCGGCACGAC 

151 CGTCGGCGAC TTCGGCGATA TGGTCAAAGA ACAAAT CCAA GCCGAGCTGG 

2 01 AGAAAAAAGG CTACACCGTC AAACTGGTCG AGTTTACCGA CTATGTACGC 

251 CCGAATCTGG CATTGGCTGA GGGCGAGTTG 

This corresponds to the amino acid sequence <SEQ ID 216; ORE4>: 



1 MKTFFKTLSA AALALILAAC G.QKDSAPAA SASAAADNGA AKKEIVFGTT 

51 VGDFGDMVKE QIQAELEKKG YTVKLVEFTD YVRPN LALAE GEL 

Further sequence analysis revealed the complete nucleotide sequence <SEQ ID 217>: 

1 ATGAAAACCT TCTTCAAAAC CCTTTCCGCC GCCGCACTCG CGCTCATCCT 

51 CGCCGCCTGC GGCGGTCAAA AAGACAGCGC GCCCGCCGCA TCCGCTTCTG 

101 CCGCCGCCGA CAACGGCGCG GCGAAAAAAG AAATCGTCTT CGGCACGACC 

151 GTCGGCGACT TCGGCGATAT GGTCAAAGAA CAAATCCAAG CCGAGCTGGA 

2 01 GAAAAAAGGC TACACCGTCA AACTGGTCGA GTTTACCGAC TATGTACGCC 

251 CGAATCTGGC ATTGGCTGAG GGCGAGTTGG ACATCAACGT CTTCCAACAC 

301 AAACCCTATC TTGACGACTT CAAAAAAGAA CACAATCTGG ACATCACCGA 

351 AGTCTTCCAA GTGCCGACCG CGCCTTTGGG ACTGTACCCG GGCAAGCTGA 

401 AATCGCTGGA AG AAGT CAAA GACGGCAGCA CCGTATCCGC GCCCAACGAC 

4 51 CCGTCCAACT TCGCCCGCGT CTTGGTGATG CTCGACGAAC TGGGTTGGAT 

501 C AAACT CAAA GACGGCATCA ATCCGTTGAC CGCATCCAAA GCGGACATCG 

551 CCGAGAACCT GAAAAACATC AAAATCGTCG AGCTTGAAGC CGCGCAACTG 

601 CCGCGTAGCC GCGCCGACGT GGATTTTGCC GTCGTCAACG GCAACTACGC 

651 CATAAGCAGC GGCATGAAGC TGACCGAAGC CCTGTTCCAA GAACCGAGCT 

701 TTGCCTATGT CAACTGGTCT GCCGTCAAAA CCGCCGACAA AGACAGCCAA 

751 TGGCTTAAAG ACGTAACCGA GGCCTATAAC TCCGACGCGT TCAAAGCCTA 
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8 01 CGCGCACAAA CGCTTCGAGG GCTACAAATC CCCTGCCGCA TGGAAT GAAG 
851 GCGCAGCCAA ATAA 

This corresponds to the amino acid sequence <SEQ ID 218; ORF4-l>: 

1 MKTFFKTLSA AALALILAA C GGQKDSAPAA SASAAADNGA AKKEIVFGTT 

51 VGDFGDMVKE QIQAELEKKG YTVKLVEFTD YVRPNLALAE GELDINVFQH 

101 KPYLDDFKKE HNLDITEVFQ VPTAPLGLYP GKLKS LEEVK DGSTVSAPND 

151 PSNFARVLVM LDELGWIKLK DGINPLTASK ADIAENLKNI KIVELEAAQL 

201 PRSRADVDFA VVNGNYAISS GMKLTEALFQ EPSFAYVNWS AVKTADKDSQ 

251 WLKDVTEAYN SDAFKAYAHK RFEGYKSPAA WNEGAAK* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF4 shows 93.5% identity over a 93aa overlap with an ORF (ORF4a) from strain A of AT. 
meningitidis: 

10 20 30 40 50 59 

orf4 pep MKTFFKTLSAAALALILAA CG-QKD SAPAASASAAADNGAAKKEIVFGTTVGDFGDMVKE 

| | | | | | | | | | M I I I I I II I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I M I I I I I I 
or f4a MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAAXKEIVFGTTVGDFGDMVKE 



60 70 80 90 

QIQAELEKKGYTVKLVEFTDYVRPNLALAEGEL 
II I I I I I I I I M I I I I M I I II I I I I I I I 
XIQPELEKKGYTVKLVEXTDYVRXNLALAEGELDINVXQHXXYLDDXKKXHNLDITXVXQ 
70 80 90 100 110 120 

VPTAPLGLYPGKLKSLXXVKXGSTVSAPNDPXXFXRVLVMLDELGXIKLKDXIXXXXXXX 
130 140 150 160 170 180 

The complete length ORF4a nucleotide sequence <SEQ ID 219> is: 

1 ATGAAAACCT TCTTCAAAAC CCTTTCCGCC GCCGCACTCG CGCTCATCCT 

51 CGCCGCCTGC GGCGGTCAAA AAGATAGCGC GCCCGCCGCA TCCGCTTCTG 

101 CCGCCGCCGA CAACGGCGCG GCGAANAAAG AAATCGTCTT CGGCACGACC 

151 GTCGGCGACT TCGGCGATAT GGT CAAAGAA CANAT CCAAC CCGAGCTGGA 

201 GAAAAAAGGC TACACCGTCA AACTGGTCGA GTNTACCGAC TATGTGCGCN 

251 CGAATCTGGC ATTGGCTGAG GGCGAGTTGG ACATCAACGT CTTNCAACAC 

3 01 ANACNCTATC TTGACGACTN CAAAAAANAA CACAATCTGG ACATCACCNN 
351 AGTCTTNCAA GTGCCGACCG CGCCTTTGGG ACTGTACCCG GGCAAGCTGA 

4 01 AATCGCTGGA NNAAGTCAAA GANGGCAGCA CCGTATCCGC GCCCAACGAC 
4 51 CCGTNNNACT TCGNCCGCGT CTTGGTGATG CTCGACGAAC TGGGTTNGAT 
501 CAAACTCAAA GACNGCATCA NNNNGNNGNN NNNANCNANA NNNGANANNN 
551 NNNNANNNNT NNNNNNNNNN NNNNNCNNCG NNNNNNNANN NNNNNNNNNN 
601 NCGNNTNNNN NNGCNNNNNT NNANNNTNNN NNCNNCNNNN NNNNNTNNNN 
651 NANNANNAGC GGCATGAAGC TGACCGAAGC CCTGTTCCAA GAACCGAGCT 
7 01 TTGCCTATGT CAACTGGTCT GCCGTCAAAA CCGCCGACAA AGACAGCCAA 
7 51 TGGCTTAAAG ACGTAACCGA GGCCTATAAC TCCGACGCGT TCAAAGCCTA 
801 CGCGCACAAA CGCTTCGAGG GCTACAAATC CCCTGCCGCA TGGAAT GAAG 
851 GCGCAGCCAA ATAA 

This is predicted to encode a protein having amino acid sequence <SEQ ID 220>: 

1 MKTFFKTLSA AALALILAA C GGQKDSAPAA SASAAADNGA AXKEIVFGTT 

51 VGDFGDMVKE XIQPELEKKG YTVKLVEXTD YVRXNLALAE GELDINVXQH 

101 XXYLDDXKKX HNLDITXVXQ VPTAPLGLYP GKLKS LXXVK XGSTVSAPND 

151 PXXFXRVLVM LDELGXIKLK DXIXXXXXXX XXXXXXXXXX XXXXXXXXXX 

2 01 XXXXAXXXXX XXXXXXXXXS GMKLTEALFQ EPSFAYVNWS AVKTADKDSQ 

251 WLKDVTEAYN SDAFKAYAHK RFEGYKSPAA WNEGAAK* 



orf 4 . pep 
orf4a 

orf4a 



A leader peptide is underlined. 
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Further analysis of these strain A sequences revealed the complete DNA sequence <SEQ ED 221>: 

1 ATGAAAACCT TCTTCAAAAC CCTTTCCGCC GCCGCACTCG CGCTCATCCT 

51 CGCCGCCTGC GGCGGTCAAA AAGATAGCGC GCCCGCCGCA TCCGCTTCTG 

101 CCGCCGCCGA CAACGGCGCG GCGAAAAAAG AAATCGTCTT CGGCACGACC 

5 151 GTCGGCGACT TCGGCGATAT GGTCAAAGAA CAAATCCAAC CCGAGCTGGA 

2 01 GAAAAAAGGC TACACCGTCA AACTGGTCGA GTTTACCGAC TATGTGCGCC 
251 CGAATCTGGC ATTGGCTGAG GGCGAGTTGG ACATCAACGT CTTCCAACAC 

3 01 AAACCCTATC TTGACGACTT CAAAAAAGAA CACAATCTGG ACATCACCGA 
351 AGTCTTCCAA GTGCCGACCG CGCCTTTGGG ACTGTACCCG GGCAAGCTGA 

1Q 4 01 AATCGCTGGA AGAAGT C AAA GACGGCAGCA CCGTATCCGC GCCCAACGAC 

4 51 CCGTCCAACT TCGCCCGCGT CTTGGTGATG CTCGACGAAC TGGGTTGGAT 
501 CAAACTCAAA GACGGCAT C A ATCCGCTGAC CGCATCCAAA GCGGACATTG 
551 CCGAAAACCT GAAAAAC AT C AAAATCGTCG AGCTTGAAGC CGCGCAACTG 
601 CCGCGTAGCC GCGCCGACGT GGATTTTGCC GTCGTCAACG GCAACTACGC 

15 651 CATAAGCAGC GGCATGAAGC TGACCGAAGC CCTGTTCCAA GAACCGAGCT 

7 01 TTGCCTATGT CAACTGGTCT GCCGTCAAAA CCGCCGACAA AGACAGCCAA 

7 51 TGGCTTAAAG ACGTAACCGA GGCCTATAAC TCCGACGCGT TCAAAGCCTA 

801 CGCGCACAAA CGCTTCGAGG GCTACAAATC CCCTGCCGCA TGGAATGAAG 

851 GCGCAGCCAA ATAA 

20 This encodes a protein having amino acid sequence <SEQ ID 222; ORF4a-l>: 

1 MKTFFKTLSA AALALILAA C GGQKDSAPAA SASAAADNGA AKKEIVFGTT 

51 VGDFGDMVKE QIQPELEKKG YTVKLVEFTD YVRPNLALAE GELDINVFQH 

101 KPYLDDFKKE HNLDITEVFQ VPTAPLGLYP GKLKSLEEVK DGSTVSAPND 

151 PSNFARVLVM LDELGWIKLK DGINPLTASK ADIAENLKNI KIVELEAAQL 

25 201 PRSRADVDFA VVNGNYAISS GMKLTEALFQ EPSFAYVNWS AVKTADKDSQ 

251 WLKDVTEAYN SDAFKAYAHK RFEGYKSPAA WNEGAAK* 

ORF4a-l and ORF4-1 show 99.7% identity in 287 aa overlap: 

10 20 30 40 50 60 

o r f 4 a - 1 MKT FFKT LS AAALAL I LAACGGQKDS APAAS AS AAADNGAAKKE IVFGT TVGD FG DMVKE 

30 I I I I I I I I I I i I I I I I l l i l l l I l I I I I 1 I I I I i I I I I I I I I I I I M I I 

o r f 4 - 1 MKT FFKT L S AAALAL I LAACGGQKDS APAAS AS AAADNGAAKKE IVFGT T VGD FG DMVKE 

10 20 30 40 50 60 

70 80 90 100 110 120 

35 orf4a-l QIQPELEKKGYTVKLVE FTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 

Ml I I I I I I I I I I I ! I I I I I I I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I M I I I I 
or f 4-1 QIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
70 80 90 100 110 120 

40 130 140 150 160 170 180 

orf4a-l VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 



190 200 210 220 230 240 

orf 4a-l ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 
I I I I II M I II I I I I I II I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I ! I I I I I I I I I I 
orf 4-1 ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 

190 200 210 220 230 240 

250 260 270 280 

o r f 4 a- 1 AVKTADKD S QWLKDVTE AYN S DAFKAYAHKRFEGYKS PAAWNEGAAKX 

I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 1 I I I I II I I I I I I I I I 
orf 4-1 AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAKX 

250 260 270 280 

Homology with an outer membrane protein of Pasteurella haemolitica (accession q08869). 
ORF4 and this outer membrane protein show 33% aa identity in 91aa overlap: 
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10 20 

MNFKKLLGVALVS ALALT ACKDEKAQAP 

I I I : : I I I I I : I I : I : I 

VXTPNPDGRTPCPSFLFETATTSGENMKTFFKTLSAAAL— ALILAACGFKKTARPPHPL 
110 120 130 140 150 

30 40 50 60 70 80 

lic2 pasha -attaktenkaplkvgvmtgpeaqmtevavkiakekygldvelvqfteytqpnaalhskd 

: : : I : | : : I : : I : : : : Ml I = M : M : I : : II I I = 

0RF4 LpppTTARRKKEIVFGTTVGDFGDMVKEQIQAELEKKGYTVKLVEFTDYVRPNLALAEGE 
160 170 180 190 200 210 

90 100 110 120 130 140 

lip2 . pasha LDANAFQTVPYLEQEVKDRGYKLAIIGNTLWPIAAYSKKIKNISELKDGATVAIPNNAS 
I 

ORF4 L 

Homology with a predicted QRF from N .gonorrhoeae 

ORF4 shows 93.6% identity over a 94aa overlap with a predicted ORF (ORF4.ng) from N. 



Iip2. pasha 
ORF4 



10 20 30 

MKTFFKTLSAAALALILAACGXQKDSAPAA 
1 I II I I I I I : I : I I I I I I i I I I I I I I I I I 
RANAVXTPNPDGRTPCLSFLFETATTSGENMKTFFKTLSTASLALILAACGGQKDSAPAA 
200 210 220 230 240 250 

40 50 60 70 80 89 

SASA-AADNGAAKKEIVFGTTVGDFGDMVKEQIQAELEKKGYTVKLVE FTDYVRPNLALA 
11:1 : M I I I I I I I I I I I I I I i I I I I I I I M I 11 I I II I I I I M I I I I I I I I I I I I I I I 
SAAAPSADNGAAKKEIVFGTTVGDFGDMVKEQIQAELEKKGYTVKLVE FT DYVRPNLALA 
260 270 280 290 300 310 



orf4nm.pep EGEL 
I I I I 

orf4ng EGELDINVFQHKPYLDDFKKEHNLDITEAFQVPTAPLGLYPGKLKSLEEVKDGSTVSAPN 
320 330 340 350 360 370 

The complete length ORF4ng nucleotide sequence <SEQ ID 223> was predicted to encode £ 
protein having amino acid sequence <SEQ ID 224>: 

1 MKTFFKTLST ASLALILAAC GGQKDSAPAA SAAAPSADNG AAKKE I VFGT 

51 TVGDFGDMVK EQIQAELEKK GYTVKLVEFT DYVRPNLALA EGELDINVFQ 

101 HKPYLDDFKK EHNLDITEAF QVPTAPLGLY PGKLKSLEEV KDGSTVSAPN 

151 DPSNFARALV MLNELGWIKL KDGINPLTAS KADIAENLKN IKIVELEAAQ 

201 LPRSRADVDF AVVNGNYAIS SGMKLTEALF QEPSFAYVNW SAVKTADKDS 

251 QWLKDVTEAY NSDAFKAYAH KRFEGYKYPA AWNEGAAK* 

Further analysis revealed the complete length ORF4ng DNA sequence <SEQ ID 225> to be: 

1 atgAAAACCT TCTTCAAAAC cctttccgcc gccgcaCTCG CGCTCATCCT 

51 CGCAGCCTGc ggCggtcaAA AAGACAGCGC GCCCgcagcc tctgcCGCCG 

101 CCCCTTCTGC CGATAACGgc gCgGCGAAAA AAGAAAtcgt ctTCGGCACG 

151 Accgtgggcg acttcggcgA TAtggTCAAA GAACAAATCC AagcCGAgct 

201 gGAGAAAAAA GgctACACcg tcAAattggt cgaatttacc gactatgtGC 

251 gCCCGAATCT GGCATTGGCG GAGGGCGAGT TGGACATCAA CGTCTTCCAA 

301 CACAAACCCT ATCTTGACGA TTTCAAAAAA GAACACAACC TGGACATCAC 

351 CGAAGCCTTC CAAGTGCCGA CCGCGCCTTT GGGACTGTAT CCGGGCAAAC 

401 TGAAATCGCT GGAAGAAGTC AAAGACGGCA GCACCGTATC CGCGCCCAac 

4 51 gACccgTCCA ACTTCGCACG CGCCTTGGTG ATGCTGAACG AACTGGGTTG 

501 GATCAAACTC AAAGACGGCA TCAATCCGCT GACCGCATCC AAAGCCGACA 

551 TCGCGGAAAA CCTGAAAAAC ATCAAAATCG TCGAGCTTGA AGCCGCACAA 

601 CTGCCGCGCA GCCGCGCCGA CGTGGATTTT GCCGTCGTCA ACGGCAACTA 

651 CGCC7ATAAGC AGCGGCATGA AGCTGACCGA AGCCCTGTTC CAAGAGCCGA 
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7 01 GCTTTGCCTA TGTCAACTGG TCTGCCgtcA AAACCGCCGA CAAAGACAGC 
751 CAATGGCTTA AAGACGTAAC CGAGGCCTAT AACTCCGACG CGTTCAAAGC 

8 01 CTACGCGCAC AAACGCTTCG AGGGCTACAA ATACCCTGCC GCATGGAATG 
8 51 AAGGCGCAGC CAAATAA 

5 This encodes a protein having amino acid sequence <SEQ ID 226; ORF4ng-l>: 

1 MKTFFKTLSA AALALILAA C GGQKDSAPAA SAAAPSADNG AAKKEIVFGT 

51 TVGDFGDMVK EQIQAELEKK GYTVKLVEFT DYVRPNLALA EGELDINVFQ 

101 HKPYLDDFKK EHNLDITEAF QVPTAPLGLY PGKLKS LEEV KDGSTVSAPN 

151 DPSNFARALV MLNELGWIKL KDGINPLTAS KADIAENLKN IKIVELEAAQ 

10 201 LPRSRADVDF AWNGNYAIS SGMKLTEALF QEPSFAYVNW SAVKTADKDS 

251 QWLKDVTEAY NSDAFKAYAH KRFEGYKYPA AWNEGAAK* 

This shows 97.6% identity in 288 aa overlap with ORF4-1: 

10 20 30 40 50 59 
orf 4-1 pep MKTFFKTLSAAALALILAACGGQKDSAPAASASA-AADNGAAKKEIVFGTTVGDFGDMVK 
15 ' I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I 1 I I I : I : I I I I I I I I I I 11 I 

orf4ng-l mktffktlsaaalalilaacggqkdsapaasaaapsadngaakkeivfgttvgdfgdmvk 

10 20 30 40 50 60 

60 70 80 90 100 110 119 

20 orf 4-1 .pep eqiqaelekkgytvklveftdyvrpnlalaegeldinvfqhkpylddfkkehnlditevf 

I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I = I 
orf4ng-l EQIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEAF 
70 80 90 100 110 120 

25 120 130 140 150 160 170 179 

orf 4-1 . pep QVPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTAS 
I I I I I I I I I I I I M I I I I I ! I I I I I I I I I I I I I I 1 M : I I I I : I I I I I I I I I I I I I II I I 
orf4ng-l QVPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARALVMLNELGWIKLKDGINPLTAS 
130 140 150 160 170 180 

30 

180 190 200 210 220 230 239 

orf 4-1 . pep KADIAENLKNIKIVELEAAQLPRSRADVDFAVWGNYAISSGMKLTEALFQEPSFAYVNW 
I I I I I I I I II I I I I I I II I I I M I I I I I I II I I I I I I 11 M I II I I I I I I I I I I I I I I I I 
orf4ng-l KADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNW 
35 190 200 210 220 230 240 

240 250 260 270 280 

orf 4-1 . pep SAVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAKX 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I 
40 orf4ng-l SAVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKYPAAWNEGAAKX 

250 260 270 280 

In addition, ORF4ng-l shows significant homology with an outer membrane protein from the 
database: 

45 ID LIP2_PASHA STANDARD; PRT; 27 6 AA. 

AC Q08869; 

DT 01-NOV-19 95 (REL. 32, CREATED) 
DT 01-NOV-1995 {REL. 32, LAST SEQUENCE UPDATE) 
DT 01-NOV-1995 (REL. 32, LAST ANNOTATION UPDATE) 
50 DE 28.2 KD OUTER MEMBRANE PROTEIN PRECURSOR. . . . 

SCORES Initl: 279 Initn: 416 Opt: 494 

Smith-Waterman score: 494; 36.0% identity in 275 aa overlap 

10 20 30 40 50 

55 orf 4ng-l .pep MKTFFKTLSAAAL— ALILAACGGQKDSAPAASAAAPSADNGAAKKEIVFGTTVGDFGDM 

I I I : : 1 I I I I : I I : I : I I I : : I : : : I I I I : : I : : I 

lip2_pasha MNFKKLLGVALVSALALTACKDEKAQAPATTA KTENKAPLK VGVMTGPEAQM 

10 20 30 40 50 

60 60 70 80 90 100 110 

orf 4ng-l . pep VKEQIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITE 

: : : : II I I : I I : I I : I : : I! I I : 1 I I : I I III:: I : : : : : 
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120 130 140 150 160 170 

5 orf 4ng-l -pep AFQVPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARALVMLNELGWIKLKDGINPLT 

: : : I : : I I : I : : I : I I I : I I : I I : I I I I I I : : I : I : I I I I I : 
1 ip2_pa sha I GNTLWP I AAYSKKIKNI SELKDGATVAI PNNASNTARALLLLQAHGLLKLKDPKN-VF 

120 130 140 150 160 170 

10 180 190 200 210 220 230 

orf 4ng-l.pep ASKADIAENLKNIKIVELEAAQLPRSRADVDFAVWGNYAISSGMKLTE — ALFQEPSFA 

I : : II I I I M I I I : : : : I I M : : M : I : : M : : I : : : : : : 
lip2_pasha ATENDIIENPKNIKIVQADTSLLTRMLDDVELAVINNTYAGQAGLSPDKDGIIVESKDSP 
180 190 200 210 220 230 

15 

240 250 260 270 280 289 

orf 4ng-l . pep YVNWSAVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKYPAAWNEGAAKX 

III : : : I I : I : ::::::: I I I : I 

lip2__pasha YVNLVVSREDNKDDPRLQTFVKSFQTEEVFQEALKLFNGGWKGW 
20 240 250 260 270 

Based on this analysis, including the homology with the outer membrane protein of Pasteurella 
haemolitica, and on the presence of a putative prokaryotic membrane lipoprotein lipid attachment 
site in the gonococcal protein, it was predicted that these proteins from N. meningitidis and 
25 N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

ORF4-1 (30kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described above. 
The products of protein expression and purification were analyzed by SDS-PAGE. Figures 8A and 
8B show, repsectively, the results of affinity purification of the His-fusion and GST-fusion 
30 proteins. Purified His-fusion protein was used to immunise mice, whose sera were used for ELISA 
(positive result), Western blot (Figure 8C), FACS analysis (Figure 8D), and a bactericidal assay 
(Figure 8E). These experiments confirm that ORF4-1 is a surface-exposed protein, and that it is a 
useful immunogen. 

Figure 8F shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF4-1. 
35 Example 27 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 227>: 

1 CCTCGTCGTC CTCGGCATGC TCCAGTTTCA AGGGGCGATT TACTCCAAGG 

51 CGGTGGAACG TATGCTCGGC ACGGTCATCG GGCTGGGCGC GGGTTTGGGC 

101 GTTTTATGGC TGAACCAGCA TTATTTCCAC GGCAACCTCC TCTTCTACCT 

40 151 CACCGTCGGC ACGGCAAGCG CACTGGCCGG CTGGGCGGCG GTCGGCAAAA 

201 ACGGCTACGT CCCTmTGCTG GCAGGGCTGA CGATGTGTAT GCTCATCGGC 

251 GACAACGGCA GCGAATGGCT CGACAGCGGA CTCATGCGCG CCATGAACGT 

301 CCTCATCGGC GyGGCCATCG CCATCGCCGC CGCCAAACTG CTGCCGCTGA 

351 AATCCACACT GATGTGGCGT TTCATGCTTG CCGACAACCT GGCCGACTGC 

401 AGCAAAATGA TTGCCGAAAT CAGCAACGGC AGGCGCATGA CCCGCGAACG 

451 CCTCGAGGAG AACATGGCGA AAATGCGCCA AAT CAACGCA CGCATGGTCA 

501 AAAGCCGCAG CCATCTCGCC GCCACATCGG GCGAAAGCTG CATCAGCCCC 
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551 GCCATGATGG AAGCCATGCA GCACGCCCAC CGTAAAATCG TCAACACCAC 

601 CGAGCTGCTC CTGACCACCG CCGCCAAGCT GCAATCTCCC AAACT CAACG 

651 GCAGCGAAAT CCGGCTGCTT GACCGCCACT TCACACTGCT CCAAAC 

701 GC AGACACGCCC GCCGCATCCG 

7 51 CATCGACACC ' GCCATCAACC CCGAACTGGA AGCCCTCGCC GAACACCTCC 

8 01 ACTACCAATG GCAGGGCTTC CTCTGGCTCA GCACCGATAT GCGTCAGGAA 
851 ATTTCCGCCC TCGTCATCCT GCTGCAACGC ACCCGCCGCA AATGGCTGGA 
901 TGCCCACGAA CGCCAACACC TGCGCCAAAG CCTGCTTGA 

This corresponds to the amino acid sequence <SEQ ID 228; ORF8>: 

1 prrp RHAPVSRGDL LQGGGTYARH GHRAGRGFGR FMAEPALFPR 

51 QPPLLPHRRH GKRTGRLGGG RQKRLRPXAG RADDVYAHRR QRQRMARQRT 

101 HARHERPHRR GHRHRRRQTA AAEIHTDVAF HACRQPGRLQ QNDCRNQQRQ 

151 AHDPRT PRGE HGENAPNQRT HGQKPQPSRR HIGRKLHQPR HDGSHAARPP 

201 XNRQHHRAAP DHRRQAAISQ TQRQRNPAAX PPLHTAPN Q 

251 TRPPHPHRHR HQPRTGSPRR TPPLPMAGLP LAQHRYASGN FRPRHPAATH 

301 PPQMAGCPRT PTPAPKPA* 

Computer analysis of this amino acid sequence gave the following results: 
Sequence motifs 

ORF8 is proline-rich and has a distribution of proline residues consistent with a surface 
localization. Furthermore the presence of an RGD motif may indicate a possible role in bacterial 
adhesion events. 

Homology with a predicted ORF from ~N .gonorrhoeae 

ORF8 shows 86.5% identity over a 312aa overlap with a predicted ORF (ORF8.ng) from N. 
gonorrhoeae: 



35 



45 



orf 8ng 


1 


MDRDDRLRRPRHAPVPRRDLLQRGGTYARYGHRAGRGFGRFMAEPALFPR 
1 1 1 1 1 1 I 1 1 Mil 1 1 1 1 1 1 : 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
PRRPRHAPVSRGDLLQGGGTYARHGHRAGRGFGRFMAE PALFPR 


50 


orf 8 .pep 


1 


44 


orf 8ng 


51 


QPPLLPDHRHGKRTGRLGGGRQKRLRPYVGGADDVHAHRRQRQRMARQRP 
1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 II : 1 1 1 1 1 1 t 1 1 1 1 1 1 


100 


orf 8 .pep 


45 


QPPLLPHRRHGKRTGRLGGGRQKRLRPXAGRADDVYAHRRQRQRMARQRT 


94 


orf 8ng 


101 


DARDERPHRRRHRHCRRQTAAAEIHTDVAFHACRQPGRLQQNDCRNQQRQ 
II 1 1 1 1 1 1 III 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
HARHERPHRR GHRHRRRQTAAAEIHTDVAFHACRQPGRMQQNDCRNQQRQ 


150 


orf 8 .pep 


95 


144 


orf8ng 


151 


AYDARTFGAEYGQNAPNQRTHGQKPQPPRRHIGRKPHQPLHDGSHAARPP 
1 : ! II 1 : 1 : 1 M 1 1 1 1 1 1 1 1 II 1 1 1 II II 1 III 1 1 1 1 1 1 1 1 1 1 
AHDPRTPRGEHGENAPNQRTHGQKPQPSRRHIGRKLHQPRHDGSHAARPP 


200 


orf 8. pep 


145 


194 


orf 8ng 


201 


QNRQHHRAAPDHRRQAAISQTQRQRNPAARPPLHTAPNRPATNRRPHQRQ 


250 


orf 8 .pep 


195 


1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
XNRQHHRAAP DHRRQAAISQTQRQRNPAAXPPLHTAPN Q 


244 


orf 8ng 


251 


TRPPHPHRHRHQPRTGSPRRTPPLPMAGFPLAQHQYASGNFRPRHPPATH 
1 1 1 1 1 1 II 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ■ 1 1 1 II 1 1 1 1 1 1 Ml 
TRPPHPHRHRHQPRTGSPRRTPPLPMAGLPLAQHRYASGNFRPRHPAATH 


300 


orf 8 .pep 


245 


294 


orf8ng 


301 


P PQMAGCPRT PT PAPKP A* 319 
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
P PQMAGCPRT PT P APKPA* 313 




orf 8 .pep 


295 





The complete length ORF8ng nucleotide sequence <SEQ ED 229> is predicted to encode a protein 
having amino acid sequence <SEQ ID 230>: 
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1 MDRDDRLRRP RHAPVPRRDL LQRGGTYARY GHRAGRGFGR FMAEPALFPR 

51 QPPLLPDHRH GKRTGRLGGG RQKRLRPYVG GADDVHAHRR QRQRMARQRP 

101 DARDERPHRR RHRHCRRQTA AAEIHTDVAF HACRQPGRLQ QNDCRNQQRQ 

151 AYDARTFGAE YGQNAPNQRT HGQKPQPPRR HIGRKPHQPL HDGSHAARPP 

201 QNRQHHRAAP DHRRQAAISQ TQRQRNPAAR PPLHTAPNRP ATNRRPHQRQ 

251 TRPPHPHRHR HQPRTGSPRR TPPLPMAGFP LAQHQYASGN FRPRHPPATH 

301 PPQMAGCPRT PTPAPKPA* 

Based on the sequence motifs in these proteins, it is predicted that the proteins from N. meningitidis 
and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



Example 28 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 23 1>: 

1 ..GAAATCAGCC TGCGGTCCGA CNACAGGCCG GTTTCCGTGN CGAAGCGGCG 
51 GGATTCGGAA CGTTTTCTGC TGTTGGACGG CGGCAACAGC CGGCTCAAGT 
101 GGGCGTGGGT GGAAAACGGC ACGTTCGCAA CCGTCGGTAG CGCGCCGTAC 
151 CGCGATTTGT CGCCTTTGGG CGCGGAGTGG GCGGAAAAGG CGGATGGAAA 

2 01 TGTCCGCATC GTCGGTTGCG CTGTGTGCGG AGAATTCAAA AAGGCACAAG 
251 TGCAGGAACA GCTCGCCCGA AAAATCGAGT GGCTGCCGTC TTCCGCACAG 
301 GCTTT.GGCA TACGCAACCA CTACCGCCAC CCCGAAGAAC ACGGTTCCGA 
351 CCGCTGGTTC AACGCCTTGG GCAGCCGCCG CTTCAGCCGC AACGCCTGCG 
4 01 TCGTCGTCAG TTGCGGCACG GCGGTAACGG TTGACGCGCT CACCGATGAC 
4 51 GGACATTATC TCGGAGA.GG AACCATCATG CCCGGTTTCC ACCTGATGAA 

501 AGAATCGCTC GCCGTCCGAA CCGCCAACCT CAACCGGCAC GCCGGTAAGC 
551 GTTATCCTTT CCCGACCGG.. 

This corresponds to the amino acid sequence <SEQ ID 232; ORF61>: 

1 ..EISLRSDXRP VSVXKRRDSE RFLLLDGGNS RLKWAWVENG TFATVGSAPY 
51 RDLSPLGAEW AEKADGNVRI VGCAVCGEFK KAQVQEQLAR KIEWLPSSAQ 
101 AXGIRNHYRH PEEHGSDRWF NALGSRRFSR NACVWSCGT AVTVDALTDD 

151 GHYLGXGTIM PGFHLMKESL AVRTANLNRH AGKRYPFPT . . 

Further work revealed the complete nucleotide sequence <SEQ ID 233>: 

1 ATGACGGTTT TGAAGCTTTC GCACTGGCGG GTGTTGGCGG AGCTTGCCGA 

51 CGGTTTGCCG CAACACGTCT CGCAACTGGC GCGTATGGCG GATATGAAGC 

101 CGCAGCAGCT CAACGGTTTT TGGCAGCAGA TGCCGGCGCA CATACGCGGG 

151 CTGTTGCGCC AACACGACGG CTATTGGCGG CTGGTGCGCC CATTGGCGGT 

2 01 TTTCGATGCC GAAGGTTTGC GCGAGCTGGG GGAAAGGTCG GGTTTTCAGA 

251 CGGCATTGAA GCACGAGTGC GCGTCCAGCA ACGACGAGAT ACTGGAATTG 

301 GCGCGGATTG CGCCGGACAA GGCGCACAAA ACCATATGCG TGACCCACCT 

351 GCAAAGTAAG GGCAGGGGGC GGCAGGGGCG GAAGTGGTCG CACCGTTTGG 

4 01 GCGAGTGTCT GATGTTCAGT TTTGGCTGGG TGTTTGACCG GCCGCAGTAT 

451 GAGTTGGGTT CGCTGTCGCC TGTTGCGGCA GTGGCGTGTC GGCGCGCCTT 

501 GTCGCGTTTA GGTTTGGATG TGCAGATTAA GTGGCCCAAT GATTTGGTTG 

551 TCGGACGCGA CAAATTGGGC GGCATTCTGA TTGAAACGGT CAGGACGGGC 

601 GGCAAAACGG TTGCCGTGGT CGGTATCGGC ATCAATTTTG TCCTGCCCAA 

651 GGAAGTAGAA AATGCCGCTT CCGTGCAATC GCTGTTTCAG ACGGCATCGC 

7 01 GGCGGGGCAA TGCCGATGCC GCCGTGCTGC TGGAAACGCT GTTGGTGGAA 

7 51 CTGGACGCGG TGTTGTTGCA ATATGCGCGG GACGGATTTG CGCCTTTTGT 

801 GGCGGAATAT CAGGCTGCCA ACCGCGACCA CGGCAAGGCG GTATTGCTGT 

851 TGCGCGACGG CGAAACCGTG TTCGAAGGCA CGGTTAAAGG CGTGGACGGA 

901 CAAGGCGTTT TGCACTTGGA AACGGCAGAG GGCAAACAGA CGGTCGTCAG 

951 CGGCGAAATC AGCCTGCGGT CCGACGACAG GCCGGTTTCC GTGCCGAAGC 

1001 GGCGGGATTC GGAACGTTTT CTGCTGTTGG ACGGCGGCAA CAGCCGGCTC 

1051 AAGTGGGCGT GGGTGGAAAA CGGCACGTTC GCAACCGTCG GTAGCGCGCC 

1101 GTACCGCGAT TTGTCGCCTT TGGGCGCGGA GTGGGCGGAA AAGGCGGATG 

1151 GAAATGTCCG CATCGTCGGT TGCGCTGTGT GCGGAGAATT CAAAAAGGCA 

1201 CAAGTGCAGG AACAGCTCGC CCGAAAAATC GAGTGGCTGC CGTCTTCCGC 

1251 ACAGGCTTTG GGCATACGCA ACCACTACCG CCACCCCGAA GAACACGGTT 

1301 CCGACCGCTG GTTCAACGCC TTGGGCAGCC GCCGCTTCAG CCGCAACGCC 
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1351 TGCGTCGTCG TCAGTTGCGG CACGGCGGTA ACGGTTGACG CGCTCACCGA 

1401 TGACGGACAT TATCTCGGGG GAACCATCAT GCCCGGTTTC CACCTGATGA 

1451 AAGAATCGCT CGCCGTCCGA ACCGCCAACC TCAACCGGCA CGCCGGTAAG 

1501 CGTTATCCTT TCCCGACCAC AACGGGCAAT GCCGTCGCCA GCGGCATGAT 

5 1551 GGATGCGGTT TGCGGCTCGG TTATGATGAT GCACGGGCGT TTGAAAGAAA 

1601 AAACCGGGGC GGGCAAGCCT GTCGATGTCA TCATTACCGG CGGCGGCGCG 

1651 GCAAAAGTTG CCGAAGCCCT GCCGCCTGCA TTTTTGGCGG AAAATACCGT 

1701 GCGCGTGGCG GACAACCTCG TCATTTACGG GTTGTTGAAC ATGATTGCCG 

1751 CCGAAGGCAG GGAATATGAA CATATTTAA 

10 This corresponds to the amino acid sequence <SEQ ID 234; ORF61-l>: 

1 MTVLKLSHWR VLAELADGLP QHVSQLARMA DMKPQQLNGF WQQMPAHIRG 

51 LLRQHDGYWR LVRPLAVFDA EGLRELGERS GFQTALKHEC ASSNDEILEL 

101 ARIAPDKAHK TICVTHLQSK GRGRQGRKWS HRLGECLMFS FGWVFDRPQY 

151 ELGSLSPVAA VACRRALSRL GLDVQIKWPN DLWGRDKLG G I L I ETVRTG 

15 201 GKTVAWGIG INFVLPKEVE NAASVQSLFQ TASRRGNADA AVLLETLLVE 

251 LDAVLLQYAR DGFAPFVAEY QAANRDHGKA VLLLRDGETV FEGTVKGVDG 

301 QGVLHLETAE GKQTVVSGEI SLRSDDRPVS VPKRRDSERF LLLDGGNSRL 

351 KWAWVENGTF ATVGSAPYRD LSPLGAEWAE KADGNVRIVG CAVCGEFKKA 

401 QVQEQLARKI EWLPSSAQAL GIRNHYRHPE EHGSDRWFNA LGSRRFSRNA 

20 4 51 CWVSCGTAV TVDALT DDGH YLGGTIMPGF HLMKESLAVR TANLNRHAGK 

501 RYPFPTTTGN AVASGMMDAV CGSVMMMHGR LKEKTGAGKP VDVIITGGGA 

551 AKVAEALPPA FLAENTVRVA DNLVIYGLLN MIAAEGREYE HI* 

Figure 9 shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF61-1. Further 
computer analysis of this amino acid sequence gave the following results: 
25 Homology with the baf protein of B. pertussis (accession number U12020). 
ORF61 and baf protein show 33% aa identity in 1 66aa overlap: 

orf61 23 LLLDGGNSRLKWAWVE-NGTFATVGSAPYR DLS PLGAEWAEKADGNVRIVGCA VCG 77 

+L+D GNSRLK W + + A AP DL LG A R +G V G 

^ baf 3 ILIDSGNSRLKVGWFDPDAPQAAREPAPVAFDNLDLDALGRWLATLPRRPQRALGVNVAG 62 

orf61 7 8 E FKKAQVQE QLAR KIEWLPSSAQAXGIRNHYRHPEEHGSDRW FNALGSRRFSRN 131 

+ + L I WL + A G+RN YR+P++ G+DRW L + 

baf 63 LARGEAIAATLRAGGCDIRWLRAQPLAMGLRNGYRNPDQLGADRWACMVGVLARQPSVHP 122 

35 orf61 132 ACWVSCGTAVTVDALTDDGHYLGXGTIMPGFHLMKESLAVRTANL 177 

+V S GTA T+D + D + G G I+PG +M+ +LA TA+L 
baf 123 PLLVAS FGTATTLDTIGPDNVFPG-GLILPGPAMMRGALAYGTAHL 167 

Homology with a predicted ORF from N. meningitidis (strain A) 
40 ORF61 shows 97.4% identity over a 189aa overlap with an ORF (ORF61a) from strain A of N. 
meningitidis: 

10 20 30 

orf 61 .pep E I SLRS DXRPVSVXKRRDSERFLLLDGGNS 

At r „ I I I I I M I I I I I I I I I I I I I I I I I I I I I 

43 orf61a TVFEGTVKGVDGQGVLHLETAEGKQTWSGEISLRSDDRPVSVPKRRDSERFLLLDGGNS 

290 300 310 320 330 340 

40 50 60 70 80 90 

n orf 61 .pep RLKWAWVENGTFATVGSAPYRDLSPLGAEWAEJCADGNVRIVGCAVCGEFKKAQVQEQLAR 

iU I I I I M I I II [[ I I !! I I I I I I I I II I I I I I I I : I I I I I [ I I I I I I | | | M II I I I I I I I 

o r f 6 1 a RLKWAWVENGT FATVGS APYRDL S PLGAE WAEKVDGNVRI VGCAVCGE FKKAQVQEQLAR 

350 360 370 380 390 400 

« 100 HO 120 130 140 150 

orf 61. pep KIEWLPSSAQAXGIRNHYRHPEEHGSDRWFNALGSRRFSRN ACVWSCGT AVTVDALTDD 

Nil Ml I | | | || | M I M II 

orf 61a K IE WLP S SAQALGIRNHYRHPEEHGS DRWFNALGS RRFSR NAC VWS CGT AVTVDALT DP 
410 420 430 440 450" 460 
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160 170 180 189 

orf61 pep GHYLGXGTIMPGFHLMKESLAVRTANLNRHAGKRYPFPT 

I I I I I I I I I I I I I I I I I I I I I I I I I 

orf61a GHYLG-GTIMPGFHLMKESLAVRTANLNRHAGKRYPFPTTTGNAVASGMMDAVCGSVMMM 
470 480 490 500 510 520 

orf61a HGRLKEKTGAGKPVDVIITGGGAAKVAEALPPAFLAENTVRVADNLVIHGLLNLIAAEGG 
530 540 550 ' 560 570 580 

The complete length ORF61a nucleotide sequence <SEQ ID 235> is: 



101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 



ATGACGGTTT 
CGGTTTGCCG 
CGCAGCAGCT 
CTGTTGCGCC 
TTTCGATGCC 
CGGCATTGAA 
GCGCGGATTG 
GCAAAGTAAG 
GCGAGTGTCT 
GAGTTGGGTT 
GTCGCGTTTG 
TCGGACGCGA 
GGCAAAACGG 
GGAAGTGGAA 
GGCGGGGAAA 
CTTGATGCGG 
GGCGGAATAT 
TGCGCGACGG 
CAAGGCGTTC 
CGGCGAAATC 
GGCGGGATTC 
AAGTGGGCGT 
GTACCGCGAT 
GAAATGTCCG 
CAAGTGCAGG 
ACAGGCTTTG 
CCGACCGCTG 
TGCGTCGTCG 
TGACGGACAT 
AAGAATCGCT 
CGTTATCCTT 
GGATGCGGTT 
AAACCGGGGC 
GCAAAAGTTG 
GCGCGTGGCG 
CCGAAGGCGG 



TGAAGCCTTC 
CAACACGTCT 
CAACGGTTTT 
AACACGACGG 
GAAGGTTTGC 
GCACGAGTGC 
CGCCGGACAA 
GGCAGGGGGC 
GATGTTCAGT 
CGCTGTCGCC 
GGTTTGAAAA 
CAAATTGGGC 
TTGCCGTGGT 
AACGCCGCTT 
TGCCGATGCC 
TGTTGTTGCA 
CAGGCTGCCA 
CGAAACCGTG 
TGCACTTGGA 
AGCCTGCGGT 
GGAACGTTTT 
GGGTGGAAAA 
TTGTCGCCTT 
CATCGTCGGT 
AACAGCTCGC 
GGCATACGCA 
GTTCAACGCC 
TCAGTTGCGG 
TATCTCGGGG 
CGCCGTCCGA 
TCCCGACCAC 
TGCGGCTCGG 
GGGCAAGCCT 
CCGAAGCCCT 
GACAACCTCG 
GGAATCGGAA 



GCACTGGCGG 
CGCAACTGGC 
TGGCAGCAGA 
CTATTGGCGG 
GCGAGCTGGG 
GCGTCCAGCA 
GGCGCACAAA 
GGCAGGGGCG 
TTTGGCTGGG 
TGTTGCGGCA 
CGCAAATCAA 
GGCATTCTGA 
CGGTATCGGC 
CCGTGCAATC 
GCCGTGTTGC 
ATATGCGCGG 
ACCGCGACCA 
TTCGAAGGCA 
AACGGCAGAG 
CCGACGACAG 
CTGCTGTTGG 
CGGCACGTTC 
TGGGCGCGGA 
TGCGCCGTGT 
CCGAAAAATC 
ACCACTACCG 
TTGGGCAGCC 
CACGGCGGTA 
GAAC CAT CAT 
ACCGCCAACC 
AACGGGCAAT 
TTATGATGAT 
GTCGATGTCA 
GCCGCCTGCA 
TCATTCACGG 
CATACTTAA 



GTGTTGGCGG 
GCGTATGGCG 
TGCCGGCGCA 
CTGGTGCGCC 
GGAAAGGTCG 
ACGACGAGAT 
ACCATATGTG 
GAAGTGGTCG 
TGTTTGACCG 
GTGGCGTGCC 
GTGGCCAAAC 
TTGAAACGGT 
ATCAATTTCG 
GCTGTTTCAG 
TGGAAACGCT 
GACGGATTTG 
CGGCAAGGCG 
CGGTTAAAGG 
GGCAAACAGA 
GCCGGTTTCC 
ACGGCGGCAA 
GCAACCGTCG 
GTGGGCGGAA 
GCGGAGAATT 
GAGTGGCTGC 
CCACCCCGAA 
GCCGCTTCAG 
ACGGTTGACG 
GCCCGGTTTC 
TCAACCGGCA 
GCCGTCGCCA 
GCACGGGCGT 
TCATTACCGG 
TTTTTGGCGG 
GCTGCTGAAC 



AGCTTGCCGA 
GATATGAAGC 
CATACGCGGG 
CATTGGCGGT 
GGTTTTCAGA 
ACTGGAATTG 
TGACCCACCT 
CACCGTTTGG 
GCCGCAGTAT 
GGCGCGCCTT 
GATTTGGTCG 
CAGGACGGGC 
TGCTGCCCAA 
ACGGCATCGC 
GTTGGCGGAA 
CGCCTTTTGT 
GTATTGCTGT 
CGTGGACGGA 
CGGTCGTCAG 
GTGCCGAAGC 
CAGCCGGCTC 
GTAGCGCGCC 
AAGGTGGATG 
CAAAAAGGCA 
CGTCTTCCGC 
GAACACGGTT 
CCGCAACGCC 
CGCTCACCGA 
CACCTGATGA 
CGCCGGTAAG 
GCGGCATGAT 
TTGAAAGAAA 
CGGCGGCGCG 
AAAATACCGT 
CTGATTGCCG 



This encodes a protein having amino acid sequence <SEQ ID 23 6>: 



1 MTVLKP SHWR VLAELADGLP QHVSQLARMA DMKPQQLNGF WQQMPAHIRG 

51 LLRQHDGYWR LVRPLAVFDA EGLRELGERS GFQTALKHEC ASSNDEILEL 

101 ARIAPDKAHK TICVTHLQSK GRGRQGRKWS HRLGECLMFS FGWVFDRPQY 

151 ELGSLSPVAA VACRRALSRL GLKTQIKWPN DLWGRDKLG GILIETVRTG 

201 GKTVAWGIG INFVLPKEVE NAASVQSLFQ TASRRGNADA AVLLETLLAE 

251 LDAVLLQYAR DGFAPFVAEY QAANRDHGKA VLLLRDGETV FEGTVKGVDG 

301 QGVLHLETAE GKQTWSGEI SLRSDDRPVS VPKRRDSERF LLLDGGNSRL 

351 KWAWVENGTF ATVGSAPYRD LSPLGAEWAE KVDGNVRIVG CAVCGE FKKA 

4 01 QVQEQLARKI EWLPSSAQAL GIRNHYRHPE EHGSDRWFNA LGSRRFSRNA 

451 CVWSCGTAV TVDALT DDGH YLGGTIMPGF HLMKESLAVR TANLNRHAGK 

501 RYPFPTTTGN AVASGMMDAV CGSVMMMHGR LKEKTGAGKP VDVIITGGGA 

551 AKVAEALP PA FLAENTVRVA DNLVIHGLLN LIAAEGGESE HT* 

ORF61a and ORF61-1 show 98.5% identity in 591 aa overlap: 



10 20 30 40 50 60 

orf 61a . pep MTVLKPSHWRVLAELADGLPQHVSQLARMADMKPQQLNGFWQQMPAHIRGLLRQHDGYWR 
I I I II I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
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MTVLKLSHWRVLAELADGLPQHVSQLARMADMKPQQLNGFWQQMPAHIRGLLRQHDGYWR 



10 



20 



30 



50 



60 



70 80 90 100 110 120 

orf 61a . pep LVRPLAVFDAEGLRELGERSGFQTALKHECASSNDEILELARIAPDKAHKTICVTHLQSK 
! I I I I I I I 1 I I I M I I 1 I I I I 1 I I I I 1 I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I II 
o r f 6 1 - 1 LVRPLAVFDAEGLRELGERSGFQTALKHECAS SNDE I LELARIAPDKAHKT I CVTHLQSK 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 61a. pep GRGRQGRKWSHRLGECLMFSFGWVFDRPQYELGSLSPVAAVACRRALSRLGLKTQ1KWPN 
I I I I I I I I I I I II II II t I I II I I M M II II I I I I I I I M I I M I I M I I I : M I II I 
orf 61-1 GRGRQGRKWSHRLGECLMFSFGWVFDRPQYELGSLSPVAAVACRRALSRLGLDVQIKWPN 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 61a . pep DLVVGRDKLGGILIETVRTGGKTVAWGIGINFVLPKEVENAASVQSLFQTASRRGNADA 
I I I I I I I I I II I I I II I II I I I I I I II I I I I I II I I I I I I II II I I II 1 M I I I I I I II I 
orf 61-1 DLWGRDKLGGILIETVRTGGKTVAWGIGINFVLPKEVENAASVQSLFQTASRRGNADA 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 61a . pep AVLLETLLAELDAVLLQYARDGFAPFVAEYQAANRDHGKAVLLLRDGETVFEGTVKGVDG 
I II I I I I I : I I I ! I I I I I M I I M II I I I II I I I I I I I I II I I I II I I [ I I I II II || I I 
orf 61-1 AVLLETLLVELDAVLLQYARDGFAPFVAEYQAANRDHGKAVLLLRDGETVFEGTVKGVDG 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 61a . pep QGVLHLETAEGKQTWSGEISLRSDDRPVSVPKRRDSERFLLLDGGNSRLKWAWVENGTF 
I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I II II I I II I I I I I I I I I II II I 
orf 61-1 QGVLHLETAEGKQTWSGEISLRSDDRPVSVPKRRDSERFLLLDGGNSRLKWAWVENGTF 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 61a . pep ATVGSAPYRDLSPLGAEWAEKVDGNVRIVGCAVCGEFKKAQVQEQLARKIEWLPSSAQAL 
I I I I I I I I I 1 I I I I I I I I I I I : I I I I I II I I I I I I I II I I I II II I I I I I I II 1 1 I I I I I 
orf 61-1 ATVGSAPYRDLSPLGAEWAEKADGNVRIVGCAVCGEFKKAQVQEQLARKIEWLPSSAQAL 
370 380 390 400 410 420 

430 440 450 460 470 480 

orf 61a . pep GIRNHYRHPEEHGSDRWFNALGSRRFSRNACVWSCGTAVTVDALTDDGHYLGGTIMPGF 
U I II II I I II I I I I I I I I I I I I I I I I I I I I I I I I I | | I I | | || | | | | | | || | | | | | | | | 
orf 61-1 GIRNHYRHPEEHGSDRWFNALGSRRFSRNACVWSCGTAVTVDALTDDGHYLGGTIMPGF 

430 440 450 460 470 480 

490 500 510 520 530 540 

orf 61a . pep HLMKESLAVRTANLNRHAGKRYPFPTTTGNAVASGMMDAVCGSVMMMHGRLKEKTGAGKP 
I M I I I I I I I I I I I I I I I I I I II I I I I I I I I I I M II I II I I I I I I I I I I I I I II I I I M 
O r f 6 1 - 1 HLMKE S LAVRT ANLNRHAGKRYP FPTTTGNAVASGMMDAVCG S VMMMHGRLKEKT GAGKP 

490 500 510 520 530 540 

550 560 570 580 590 

orf 61a . pep VDVIITGGGAAKVAEALPPAFLAENTVRVADNLVIHGLLNLIAAEGGESEHTX 

Ml I I I I I I I I I I I M I I I I I I I I I I I I ! : || I I : | | | | | | || 

or f 6 1 - 1 VDVI ITGGGAAKVAEAL P PAFL AENTVRVADNLVI YGLLNMIAAEGRE YEH IX 

550 560 570 580 590 

Homology with a predicted ORF from TV. gonorrhoeae 

ORF61 shows 94.2% identity over a 189aa overlap with a predicted ORF (ORF61.ng) from TV. 
gonorrhoeae: 



orf 61 .pep 



EISLRSDXRPVSVXKRRDSERFLLLDGGNS 30 
INN f I IN II II N II I I : I N I 
TVCEGTVKGVDGRGVLHLETAEGEQT WSGE I SLRPDNRSVSVPKRPDSERFLLLEGGN S 211 
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orf 61 .pep RLKWAWVENGTFATVGSAPYRDLSPLGAEWAEKADGNVRIVGCAVCGEFKKAQVQEQLAR 90 

I I I I I I I I I I I I I I I I I I I I 1 I 1 I I I I I I I I 1 II I I I 1 I I I I M I I I I I I I I I : I II II 
orf 61ng RLKWAWVENGTFATVGSAPYRDLSPLGAEWAEKADGNVRIVGCAVCGESKKAQVKEQLAR 271 

orf 61 .pep KIEWLPSSAQAXGIRNHYRHPEEHGSDRWFNALGSRRFSRNACVWSCGTAVTVDALTDD 150 

I I II I I II I II I I I I 1 I i I ! I I I I II I I I I I II I I I I I I I I I 1 I 1 I I 1 1 I I I I I I I I 1 I 
orf 61ng KIEWLPSSAQALGIRNHYRHPEEHGSDRWFNALGSRRFSRNACVWSCGTAVTVDALTDD 331 

orf 61. pep GHYLGXGT IMPGFHLMKE S LAVRTANLNRHAGKRY PFPT 189 

I I I I I I M I II I I I I M I ! I N II M I I MINIMI 
orf61ng GHYLG-GTIMPGFHLMKESLAVRTANLNRPAGKRYPFPTTTGNAVASGMMDAVCGSIMMM 3 90 

An ORF61ng nucleotide sequence <SEQ ED 237> was predicted to encode a protein having amino 
acid sequence <SEQ ID 238>: 

1 MFSFGWAFDR PQYEL GSLSP VAALAC RRA1 GCLGLETQIK WPNDLWGRD 

51 KLGGILIETV RAGGKTVAVV GIGINFVLPK EVENAASVQS LFQTASRRGN 

101 ADAAV1LETL LAELGAVLEQ YAEEGFAPFL NEYETANRDH GKAVLLLRDG 

151 ETVCEGTVKG VDGRGVLHLE TAEGEQTWS GEISLRPDNR SVSVPKRPDS 

201 ERFLLLEGGN SRLKWAWVEN GTFATVGSAP YRDLSPLGAE WAEKADGNVR 

251 IVGCAVCGES KKAQVKEQLA RKIEWLPSSA QALGIRNHYR HPEEHGSDRW 

301 FNALGSRRFS RNACVWSCG TAVTVDALTD DGHYLGGTIM PGFHLMKESL 

351 AVRTANLNRP AGKRYPFPTT TGNAVASGMM DAVCGSIMMM HGRLKEKNGA 

4 01 GKPVDVIITG GGAAKVAEAL PPAFLAENTV RVADNLVIHG LLNLIAAEGG 

451 ESEHA* 

Further analysis revealed the complete gonococcal DNA sequence <SEQ ID 239> to be: 

1 ATGACGGTTT TGAAGCCTTC GCATTGGCGG GTGTTGGCGG AGCTTGCCGA 

51 CGGTTTGCCG CAACACGTAT CGCAATTGGC GCGTGAGGCG GACATGAAGC 

101 CGCAGCAGCT CAACGGTTTT TGGCAGCAGA TGCCGGCGCA TATACGCGGG 

151 CTGTTGCGCC AACACGACGG CTATTGGCGG CTGGTGCGCC CCTTGGCGGT 

201 TTTCGATGCC GAAGGTTTGC GCGATCTGGG GGAAAGGTCG GGTTTTCAGA 

251 CGGCATTGAA GCACGAGTGC GCGTCCAGCA ACGACGAGAT ACTGGAATTG 

301 GCGCGGATTG CGCCGGACAA GGCGCACAAA ACCATATGCG TGACCCACCT 

351 GCAAAGTAAG GGCAGGGGGC GGCAGGGGCG GAAGTGGTCG CACCGTTTGG 

4 01 GCGAGTGCCT GATGTTCAGT TTCGGCTGGG CGTTTGACCG GCCGCAGTAT 

4 51 GAGTTGGGTT CGCTGTCGCC TGTTGCGGCA CTTGCGTGCC GGCGCGCTTT 

5 01 GGGGTGTTTG GGTTTGGAAA CGCAAATCAA GTGGCCAAAC GATTTGGTCG 
551 TCGGACGCGA CAAATTGGGC GGCATTCTGA TTGAAACAGT CAGGGCGGGC 
601 GGTAAAACGG TTGCCGTGGT CGGTATCGGC ATCAATTTCG TGCTGCCCAA 
651 GGAAGTGGAA AACGCCGCTT CCGTGCAGTC GCTGTTTCAG ACGGCATCGC 

7 01 GGCGGGGCAA TGCCGATGCC GCCGTATTGC TGGAAACATT GCTTGCGGAA 
751 CTGGGCGCGG TGTTGGAACA ATATGCGGAA GAAGGGTTCG CGCCATTTTT 

8 01 AAATGAGTAT GAAACGGCCA ACCGCGACCA CGGCAAGGCG GTATTGCTGT 
851 TGCGCGACGG CGAAACCGTG TGCGAAGGCA CGGTTAAAGG CGTGGACGGA 
901 CGAGGCGTTC TGCACTTGGA AACGGCAgaa ggcgaACAGa cggtcgtcag 
951 cggcgaaaTC AGcctGCggc ccgacaacaG GTCGGtttcc gtgccgaagc 

1001 ggccggatTC GgaacgtTTT tTGCtgttgg aaggcgggaa cagccgGCTC 

1051 AAGTGGGCGT GggtggAAAa cggcacgttc gcaaccgtgg gcagcgcgCc 

1101 gtaCCGCGAT TTGTCGCCTT TGGGCGCGGA GTGGGCGGAA AAGGCGGATG 

1151 GAAATGTCCG CATCGTCGGT TGCGCCGTGT GCGGAGAATC CAAAAAGGCA 

12 01 CAAGTGAAGG AACAGCTCGC CCGAAAAATC GAGTGGCTGC CGTCTTCCGC 

1251 ACAGGCTTTG GGCATACGCA ACCACTACCG CCACCCCGAA GAACACGGTT 

1301 CCGACCGTTG GTTCAACGCC TTGGGCAGCC GCCGCTTCAG CCGCAACGCC 

1351 TGCGTCGTCG TCAGTTGCGG CACGGCGGTA ACGGTTGACG CGCTCACCGA 

14 01 TGACGGACAT TATCTCGGCG GAACCATCAT GCCCGGCTTC CACCTGATGA 

14 51 AAGAATCGCT CGCCGTCCGA ACCGCCAACC TCAACCGCCC CGCCGGCAAA 

1501 CGTTACCCTT TCCCGACCAC AACGGGCAAC GCCGTCGCAA GCGGCATGAT 

1551 GGACGCGGTT TGCGGCTCGA TAATGATGAT GCACGGCCGT TTGAAAGAAA 

1601 AAAACGGCGC GGGCAAGCCT GTCGATGTCA TCATTACCGG CGGCGGCGCG 

1651 GCGAAAGTCG CCGAAGCCCT GCCGCCTGCA TTTTTGGCGG AAAATACCGT 

17 01 GCGCGTGGCG GACAACCTCG TCATCCACGG GCTGCTGAAC CTGATTGCCG 

1751 CCGAAGGCGG GGAATCGGAA CACGCTTAA 

This corresponds to the amino acid sequence <SEQ ID 240; ORF61ng-l>: 

1 MTVLKPSHWR VLAELADGLP QHVSQLAREA DMKPQQLNGF WQQMPAHIRG 
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101 

151 
201 
251 
301 
351 
401 
451 
501 
551 



LLRQHDGYWR 
ARIAPDKAHK 
ELGSLSPVAA 
GKTVAVVG I G 
LGAVLEQYAE 
RGVLHLETAE 
KWAWVENGTF 
QVKEQLARKI 
CVVVSCGTAV 



LVRPLAVFDA 
TICVTHLQSK 
LACRRALGCL 
INFVLPKEVE 
EGFAPFLNEY 
GEQTWSGEI 
ATVG3APYRD 
EWLPSSAQAL 
TVDALT DDGH 



EGLRDLGERS 
GRGRQGRKWS 
GLETQIKWPN 
NAASVQSLFQ 
ETANRDHGKA 
SLRPDNRSVS 
LSPLGAEWAE 
GIRNHYRHPE 
YLGGTIMPGF 
CGS IiyiMMHGR 
DNLVIHGLLN 



GFQTALKHEC 
HRLGE CliMFS 
DLWGRDKLG 
TASRRGNADA 
VLLLRDGETV 
VPKRPDSERF 
KADGNVRIVG 
EHGSDRWFNA 
HLMKESLAVR 
LKEKNGAGKP 
LIAAEGGESE 



ASSNDEILEL 
FGWAFDRPQY 
GILIETVRAG 
AVLLETLLAE 
CEGTVKGVDG 
LLLEGGNSRL 
CAVCGESKKA 
LGSRRFSRNA 
TANLNRPAGK 
VDVI ITGGGA 
HA* 



ORF61ng-l and ORF61-1 show 93.9% identity in 591 aa overlap: 

orf 61ng-l .pep MTVLKPSHWRVLAELADGLPQHVSQLAREADMKPQQLNGFWQQMPAHIRGLLRQHDGYWR 60 

MM! I I M M M M M M M M I M I M M M M M M M M M M M I M M M M 
orf 61-1 MTVLKLSHWRVLAELADGLPQHVSQLARMADMKPQQLNGFWQQMPAHIRGLLRQHDGYWR 60 

orf 61ng-l .pep LVRPLAVFDAEGLRDLGERSGFQTALKHECASSNDEILELARIAPDKAHKTICVTHLQSK 120 

M M M M M M M : M M M I M M M M M M M M M I I M M M M I M M I M M 
orf 61-1 LVRPLAVFDAEGLRELGERSGFQTALKHECASSNDEILELARIAPDKAHKTICVTHLQSK 12 0 

GRGRQGRKWSHRLGECLMFSFGWAFDRPQYELGSLSPVAALACRRALGCLGLETQIKWPN 18 0 
M M II M I I I II I M M I M I 1 : I I I M M I M M I I I I ■" M II I I : I M : : I I M I I 
GRGRQGRKWSHRLGECLMFSFGWVFDRPQYELGSLSPVAAVACRRALSRLGLDVQIKWPN 180 



50 



orf 61ng-l .pep 
orf 61-1 
orf 61ng-l .pep 
orf61-l 
orf 61ng-l.pep 
orf61-l 
orf 61ng-l .pep 
orf 61-1 
orf 61ng-l . pep 
orf 61-1 
orf 61ng-l.pep 
orf61-l 
orf 61ng-l .pep 
orf61-l 
orf 61ng-l . pep 
orf 61-1 



DLVVGRDKLGGILIETVRAGGKTVAWGIGINFVLPKEVENAASVQSLFQTASRRGNADA 2 4 0 

I M II II II II I I :\ I I I M I I I I I I I I I I I I I I I I I I I | | | | M M I I I I I I M 

DLWGRDKLGGILIETVRTGGKTVAWGIGINFVLPKEVENAASVQSLFQTASRRGNADA 24 0 

AVLLETLLAELGAVLEQYAEEGFAPFLNEYETANRDHGKAVLLLRDGETVCEGTVKGVDG 300 
I I I I I I I I : I I III I I I : : M M I : II : M I I I I II I I II I I I II II M I I I I I I I 
AVLLETLLVELDAVLLQYARDGFAPFVAEYQAANRDHGKAVLLLRDGETVFEGTVKGVDG 300 

RGVLHLETAEGEQTWSGEI SLRPDNRSVS VPKRPDSERFLLLEGGNSRLKWAWVENGTF 3 60 

: M M I I I I I I : II I I 1:1 MINI I I I I I I I I : II II M M I M I I M I 

QGVLHLETAE GKQTVVSGE I SLRSDDRPVSVPKRRDSERFLLLDGGN SRLKWAWVENGT F 3 60 

ATVGSAPYRDLSPLGAEWAEKADGNVRIVGCAVCGESKKAQVKEQLARKIEWLPSSAQAL 420 
I M II I I I I I II 1 I I I I I M II I I I II II ] I i I I I I I I I I I : I I M I I I I M I 1 I I I I I 
ATVGSAPYRDLSPLGAEWAEKADGNVRIVGCAVCGEFKKAQVQEQLARKIEWLPSSAQAL 42 0 

GIRNHYRHPEEHGSDRWFNALGSRRFSRNACVWSCGTAVTVDALTDDGHYLGGTIMPGF 48 0 
M I I I I I I II I I I I I II I I I I I I I I I M I I I I I I M M I I I I I I I M I I I I I M I I I I I I 
GIRNHYRHPEEHGSDRWFNALGSRRFSRNACWVSCGTAVTVDALTDDGHYLGGTIMPGF 480 

HLMKESLAVRTANLNRPAGKRYPFPTTTGNAVASGMMDAVCGSIMMMHGRLKEKNGAGKP 540 



VDVIITGGGAAKVAEALPPAFLAENTVRVADNLVIHGLLNLIAAEGGESEHAX 593 
I I I I I 1 I I I I M I I I I I II I I I I I I I I I I M I I I I : I I I I : I I I I I | II 
VDVIITGGGAAKVAEALPPAFLAENTVRVADNLVIYGLLNMIAAEGREYEHIX 593 



Based on this analysis, including the homology with the baf protein of B. pertussis and the presence 
of a putative prokaryotic membrane lipoprotein lipid attachment site, it is predicted that these 
55 proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for 
vaccines or diagnostics, or for raising antibodies. 

Example 29 

The following partial DNA sequence was identified mN. meningitidis <SEQ ID 241>: 
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1 ATGTTTTACC AAATCCTTGC CCTGATTATC TGGAGCAGCT CGTTTATTGC 

51 CGCCAAATAT GTCTATGGCG GCATCGATCC CGCATTGATG GTCGGCGTGC 

101 GCCTGCTAAT TGCCGCGCTG CCTGCACTGC CCGCCTGCCG CCGTCATGTC 

151 GGCAAGATTC CGCGTGAGGA ATGGAAGCCG TTGCTGATTG TGTCGTTCGT 

201 CAACTATGTG CTGACCCTGC TGCTTCAGTT TGTCGGGTTG AAATACACTT 

251 CCGCCGCCAG CGCATCGGTC ATTGTCGGAC TCGAGCCGCT GCTGATGGTG 

301 TTTGTCGGAC ACTTTTTCTT CAACGACAAA GCGCGTGCCT ACCACTGGAT 

351 ATGCGGCGCG GCGGCATTTG CCGGTGTCGC GCTGCTGATG GCGGGCGGTG 

401 CGGaAGAGGG CGGCGaAGTC GGCTGGTTCG GCTGCCTGCT GGTGTTGTTG 

451 GCGGGCGCGG GCTTTTGTGC CGCTATGCGT CCGACGCAAA GGCTGATTGC 

501 ACGCATCGGC GCACCGGCAT TCACATCTGT TTCCATTGCC GCCGCATCGT 

551 TGATGTGCCT GCCGTTTTCG CTTGCTTTGG CGCAAAGTTA T AC CGTGGAC 

601 TGGAGCGTCG GGATGGTATT GTCGCTGCTG TATTTGGGTT TGGGGTGC. . 

This corresponds to the amino acid sequence <SEQ ID 242; ORF62>: 

1 MFYQILALII WSSSFIAAKY VYGGIDPALM VGVRLLIAAL PALPACRRHV 

51 GKIPREEWKP LLIVSFVNYV LTLLLQFVGL KYTSAASASV IVGLEPLLMV 

101 FVGHFFFNDK ARAYHWICGA AAFAGVALLM AGGAEEGGEV GWFGCLLVLL 

151 AGAGFCAAMR PTQRLIARIG APAFTSVSIA AASLMCLPFS LALAQSYTVD 

201 WSVGMVLSLL YLGLGC . . 

Further work revealed the complete nucleotide sequence <SEQ ID 243>: 

1 ATGTTTTACC AAATCCTTGC CCTGATTATC TGGAGCAGCT CGTTTATTGC 

51 CGCCAAATAT GTCTATGGCG GCATCGATCC CGCATTGATG GTCGGCGTGC 

101 GCCTGCTAAT TGCCGCGCTG CCTGCACTGC CCGCCTGCCG CCGTCATGTC 

151 GGCAAGATTC CGCGTGAGGA ATGGAAGCCG TTGCTGATTG TGTCGTTCGT 

201 CAACTATGTG CTGACCCTGC TGCTTCAGTT TGTCGGGTTG AAATACACTT 

2 51 CCGCCGCCAG CGCATCGGTC ATTGTCGGAC TCGAGCCGCT GCTGATGGTG 

301 TTTGTCGGAC ACTTTTTCTT CAACGACAAA GCGCGTGCCT ACCACTGGAT 

351 ATGCGGCGCG GCGGCATTTG CCGGTGTCGC GCTGCTGATG GCGGGCGGTG 

4 01 CGGAAGAGGG CGGCGAAGTC GGCTGGTTCG GCTGCCTGCT GGTGTTGTTG 

4 51 GCGGGCGCGG GCTTTTGTGC CGCTATGCGT CCGACGCAAA GGCTGATTGC 

501 ACGCATCGGC GCACCGGCAT TCACATCTGT TTCCATTGCC GCCGCATCGT 

551 TGATGTGCCT GCCGTTTTCG CTTGCTTTGG CGCAAAGTTA T AC CGTGGAC 

601 TGGAGCGTCG GGATGGTATT GTCGCTGCTG TATTTGGGTT TGGGGTGCGG 

651 CTGGTACGCC TATTGGCTGT GGAACAAGGG GATGAGCCGT GTTCCTGCCA 

7 01 ATGTTTCGGG ACTGTTGATT TCGCTCGAAC CCGTCGTCGG CGTGCTGCTG 

7 51 GCGGTTTTGA TTTTGGGCGA ACACCTGTCG CCCGTGTCCG CCTTGGGCGT 
801 GTTTGTCGTC ATCGCCGCCA CCTTGGTTGC CGGCCGGCTG TCGCATCAAA 

8 51 AATAA 

This corresponds to the amino acid sequence <SEQ ID 244; ORF62-l>: 

1 MFYQILALII WSSSFIA AKY VYGGID PALM VGVRLLIAAL PAL PACRRHV 

51 GKIPREEWKP L LIVSFVNYV LTLLLQFV GL KYTS AASASV IVGLEPLLMV 

101 FVGHFFFNDK ARAYHW ICGA AAFAGVALLM AGGA EEGGEV GW FGCLLVLL 

151 AGAGFCAAM R PTQRLIARIG APAFTS VSIA AASLMCLPFS LALA QSYTVD 

201 WSVGMVLSLL YLGLGCGWYA YWLWNKGMSR VPANVSG LLI SLEPWGVLL 

251 AVLI LGEHLS P VSALGVFW IAATLVAG RL SHQK* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with hypothetical transmembrane protein HI0976 of H. influenzae ("accession number 057147) 
ORF62 and HI0976 show 50% aa identity in 1 14aa overlap: 

Orf62 1 MFYQILALIIWSSSFIAAKYVYGGIDPALMVGVRXXXXXXXXXXXCRRHVGKIPREEWKP 60 

M YQILAL+IWSSS IKY +DP L+V VR R KI + K 

HI097 6 1 MLYQI LALL I WS S S L I VGKLT Y SMMD P VL WQVRL I IAMI I VMPL FLRRWKKI DKPMRKQ 60 

Orf62 61 LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAY 114 

L ++F NY LLQF+GLKYTSA+SA ++GLEPLL+VFVGHFFF K + 
HI0976 61 LWWLAFFNYTAVFLLQFIGLKYT SASSAVTMIGLEPLLWFVGHFFFKTKQNGF 114 
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Homology with a predicted QRF from N. meningitidis (strain A) 

ORF62 shows 99.5% identity over a 216aa overlap with an ORF (ORF62a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf62 pep MFYQILAL IIWSSSFIA AKYVYGGID PALMVGVRLLIAALPAL PACRRHVGKIPREEWKP 

I I I I I I I I I I I I I I i I I I I I I I H I I i I I I I I I I I I I I I I I 1 I I I I I I I 

o r f 6 2 a MFYQI LAL IIWSSSF I A AKYVYGG I D PALMVGVRLLIAALPAL PACRRHVGKI PREEWKP 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf62 pep L LIVSFVNYVLTLLLQFV GLKYTS AASASVIVGLEPLLMVFV GHFFFNDKARAYHWICGA 
I 1 M I ! I I II I t I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I M I I M I I I I M II II I I 
orf62a L LIVS FVNYVLT LLLQFV GLKYT S AAS ASVI VGLE PLLMVFV GH FFFN DKARAYHW I CGA 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 62 .pep AAFAGVALLMAGGA EEGGEVGW FGCLLVLLAGAGFCAAM RPT QRLIARIGAPAFTSVSIA 
I | | I I I I I I I I I I I I I II I I I II I I I I I I I I II I I I I I I I I I I II I II I I I I I I I II I I I 
o r f 6 2 a AAFAGVALLMAGGA EEGGEVGW FGCLLVLLAGAGFCAAM RPT QRL I AR I GAPAFT S VSIA 

130 140 150 160 170 180 



190 200 210 

AASLMCLPFSLAL AQSYTVDWSVGMVLSLLYLGLGC 
I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I : I I 

AASLMCLPFSLALA QSYTVDWSVGMVLSLLYLGVGCSWYAYWLWNKGMSRVPANVSG LLI 

190 200 210 220 230 240 



orf 62a SLEPWGVLLAVLI LGEHLSPVSVLGVFWIAATLVAGRLSHQKX 
250 260 270 280 

The complete length ORF62a nucleotide sequence <SEQ ED 245> is: 



1 ATGTTTTACC AAATCCTTGC CCTGATTATC TGGAGCAGCT CGTTTATTGC 

51 CGCCAAATAT GTCTATGGCG GCATCGATCC CGCATTGATG GTCGGCGTGC 

101 GCCTGCTGAT TGCTGCGCTG CCTGCACTGC CCGCCTGCCG CCGTCATGTC 

151 GGCAAGATTC CGCGTGAGGA ATGGAAGCCG TTGCTGATTG TGTCGTTCGT 

201 CAACTATGTG CTGACCCTGC TACTTCAGTT TGTCGGGTTG AAATACACTT 

251 CCGCCGCCAG CGCATCGGTC ATTGTCGGAC TCGAGCCACT GCTGATGGTG 

301 TTTGTCGGAC ACTTTTTCTT CAACGACAAA GCGCGTGCCT ACCACTGGAT 

351 ATGCGGCGCG GCGGCATTTG CCGGTGTCGC GCTGCTGATG GCGGGCGGTG 

4 01 CGGAAGAGGG CGGCGAAGTC GGCTGGTTCG GCTGCCTGCT GGTGTTGTTG 

451 GCGGGCGCGG GCTTTTGTGC CGCTATGCGT CCGACGCAAA GGCTGATTGC 

501 ACGCATCGGC GCACCGGCAT TCACATCTGT TTCCATTGCC GCCGCATCGT 

551 TGATGTGCCT GCCGTTTTCG CTTGCTTTGG CGCAAAGTTA TACCGTGGAC 

601 TGGAGCGTCG GAATGGTATT GTCGCTGCTG TATTTGGGCG TGGGGTGCAG 

651 CTGGTACGCC TATTGGCTGT GGAACAAGGG GATGAGCCGT GTTCCTGCCA 

701 ACGTTTCGGG ACTGTTGATT TCGCTCGAAC CCGTCGTCGG CGTGCTGCTG 

751 GCGGTTTTGA TTTTGGGCGA ACACCTGTCG CCCGTGTCCG TCTTGGGCGT 

8 01 GTTTGTCGTC ATCGCCGCCA CCTTGGTTGC CGGCCGGCTG TCGCATCAAA 

851 AATAA 

This encodes a protein having amino acid sequence <SEQ ID 246>: 



1 MFYQ1LALII WSSSFIA AKY VYGGID PALM VGVRLLIAAL PAL P ACRRHV 

51 GKIPREEWKP L LIVSFVNYV LTLLLQFV GL KYTS AASASV IVGLEPLLMV 

101 FVGHFFFNDK ARAYHW ICGA AAFAGVALLM AGGA EEGGEV GW FGCLLVLL 

151 AG AG FC AAM R PTQRLIARIG APAFTS VSIA AASLMCLPFS LAL AQSYTVD 

2 01 WSVGMVLSLL YLGVGCSWYA YWLWNKGMSR VPANVSG LLI SLEPWGVLL 

251 AVLI LGEHLS P VSVLGVFW I AAT LVAG RL SHQK* 

ORF62a and ORF62-1 show 98.9% identity in 284 aa overlap: 



orf 62a . pep MFYQILALI I WSSSFI AAKYVYGG I DPALMVGVRLLIAALPALPACRRHVGKI PREEWKP 60 

I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I 
orf 62-1 MFYQILALI IWSSSFIAAKYVYGGIDPALMVGVRLLIAALPALPACRRHVGKIPREEWKP 60 
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orf62a pep LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 120 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I M I I 1 I I I I I I I I I I I I I 

orf62-l LLIVSFVNYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWICGA 120 

orf62a pep AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 18 0 

| | | | | | | | | | II I I I I I I I I I I I I I I I II I I I M i M M I I 1 I I I I 1 I II I I I I I I N I I 

orf 62-1 AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 180 

orf62a pep AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGVGCSWYAYWLWNKGMSRVPANVSGLLI 24 0 

I I 1 M I I I I I I I I I I I I I I = I I : 1 I 1 I M I I I I I I I I 

AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGLGCGWYAYWLWNKGMSRVPANVSGLLI 24 0 



orf 62-1 
orf 62a. pep 



SLEPVVGVLLAVLILGEHLSPVSVLGVFWIAATLVAGRLSHQKX 285 
I I I I I I I I I M I I I I I I M I I I I : I I I I II I I I I I I I I ! II I M I 
rf62-l SLEPWGVLLAVLILGEHLSPVSALGVFWIAATLVAGRLSHQKX 285 



Homology with a predicted ORF from A '.gonorrhoeae 

ORF62 shows 99.5% identity over a 216aa overlap with a predicted ORF (ORF62.ng) from N. 
gonorrhoeae: 

orf62 pep MFYQILALIIWSSSFIAAKYVYGGIDPALMVGVRLLIAALPALPACRRHVGKIPREEWKP 60 

I I I I I I I I I I I : I I I I I II I I I I I I I I M II I I I I I I I II II II II I I I I I I I I I M I I I 

orf 62ng mfyqilaliiwgssfiaakyvyggidpalmvgvrlliaalpalpacrrhvgkipreewkp 60 

orf 62 pep LLIVSFWYVLTLLLQFVGLKYTSAASASVIVGLEPLLMVFVGHFFFNDKARAYHWrCGA 12 0 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II II II 1 1 1 1 1 1 1 1 II II 1 1 

orf62ng LL IVS FVNYVLTLLLQFVGLKYT SAASASVIVGLEPLLMVFVGHFFFNDKARAYHW ICGA 120 

orf 62 .pep AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 180 

I I I I I I I I I I 11 I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I M I I i I I I I I M I 

orf62ng AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 180 

orf 62. pep AAS LMC LP FS LALAQS YTVDWS VGMVL S L LYLGLGC 216 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf 62ng AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGLGCGWYAYWLWNKGMSRVPANASGLLI 24 0 



The complete length ORF62ng nucleotide sequence <SEQ ID 247> is: 

1 ATGTTTTACC AAATCCTTGC CCTGATTATC TGGGGCAGCT CGTTTATTGC 

51 CGCCAAATAT GTCTATGGCG GCATCGATCC CGCATTGATG GTCGGCGTGC 

101 GCCTGCT GAT TGCCGCGCTG CCTGCACTGC CCGCCTGCCG CCGTCATGTC 

151 GGCAAGATTC CGCGTGAGGA ATGGAAGCCG TTGCTGATTG TGTCGTTCGT 

201 CAACTATGTG CTGACCCTGC TGCTTCAGTT TGTCGGGTTG AAATACACTT 

251 CCGCCGCCAG CGCATCGGTC ATTGTCGGAC TCGAGCCGCT GCTGATGGTG 

301 TTTGTCGGAC ACTTTTTCTT CAACGACAAA GCGCGTGCCT AC C ACT GG AT 

351 ATGCGGCGCG GCGGCATTTG CCGGTGTCGC GCTGCTGATG GCGGGCGGTG 

401 CGGAAGAGGG CGGCGAAGTC GGCTGGTTCG GCTGCCTGCT GGTGTTGTTG 

4 51 GCGGGCGCGG GCTTTTGTGC CGCTATGCGT CCGACGCAAA GGCTGATTGC 

501 CCGCATCGGC GCACCGGCAT TCACATCTGT TTCCATTGCC GCCGCATCGT 

551 TGATGTGCCT GCCGTTTTCG CTTGCTTTGG CGCAAAGTTA TACCGTGGAC 

601 TGGAGCGTCG GGAT GGTATT GTCGCTGTTG TATTTGGGTT TGGGGTGCGG 

651 CTGGTACGCC TATTGGCTGT GGAACAAGGG GATGAGCCGT GTTCCTGCCA 

701 ACGCGTCGGG ACTGTTGATT TCGCTCGAAC CCGTCGTCGG CGTGCTGTTG 

751 GCGGTTTTGA TTTTGGGCGA ACATTTATCG CCCGTGTCCG CCTTGGGCGT 

801 GTTTGTCGTC ATCGCCGCCA CTTTCGCCGC CGGCCGGCTG TCGCGCAGGG 

851 ACGCGCAAAA CGGCAATGCC GTCTGA 

This encodes a protein having amino acid sequence <SEQ ID 248>: 



1 MFYOILALII WGSSFIA AKY VYGG I D PALM VGVRLLIAAL PAL PACRRHV 

51 GKIPREEWKP L LIVSFWYV LTLLLQFV GL KYTS AASASV IVGLEPLLMV 

101 FVGHFFFNDK ARAYHW ICGA AAFAGVALLM AGGA EEGGEV GW FGCLLVLL 

151 AGAGFCAAM R PTQRLIARIG APAFTS VSIA AASLMCLPFS LALA QS YTVD 

201 WSVGMVLSLL YLGLGCGWYA YWLWNKGMSR VPANASG LLI SLEPWGVLL 

251 AVLI LGEHLS P VSALGVFW IAATFAAG RL SRRDAQNGNA V* 
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ORF62ng and ORF62-1 show 97.9% identity in 283 aa overlap: 



orf62ng.pep 



MFYQILALIIWGSSFIAAKYVYGGIDPALMVGVRLLIAALPALPACRRHVGKIPREEWKP 
| | | | | | I I I I I : I I I I I I I M I I I I I I I M I I M I I I I I I I M I I I I I I I I I I I I I 1 I N 
MFYQILALI I WS S S FIAAKYVYGG IDPALMVGVRLLI AALPAL PACRRHVGKI PREEWKP 



10 



20 



30 



40 



50 



60 



70 80 90 100 110 120 

orf62nq pep llivsfvnyvltlllqfvglkytsaasasvivglepllmvfvghfffndkarayhwicga 

| | | M | | I I I I I I I i I I I I I I I I I I I I I I 1 I I I I I i I II I I I I I I I I I I I I I I I M I I 1 I 

orf62-l llivsfvnyvltlllqfvglkytsaasasvivglepllmvfvghfffndkarayhwicga 
70 80 90 100 110 120 

130 140 150 160 170 180 

orf62ng pep AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 

I I I I M I II I I I I I M I I I II I I I I II I I I I I I I M I I I I I I I I I I I I I I I I I I M I I M 

orf 62-1 AAFAGVALLMAGGAEEGGEVGWFGCLLVLLAGAGFCAAMRPTQRLIARIGAPAFTSVSIA 
130 140 150 160 170 180 

190 200 210 220 230 240 

orf 62ng pep AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGLGCGWYAYWLWNKGMSRVPANASGLLI 
M I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I 
orf 62-1 AASLMCLPFSLALAQSYTVDWSVGMVLSLLYLGLGCGWYAYWLWNKGMSRVPANVSGLLI 

190 200 210 220 230 240 

250 260 270 280 290 

orf 62ng . pep S LE PWGVLL AVL I LGEHL S PVS ALGVFWI AAT FAAGRL SRRD AQNGNAVX 

M I I I I I I I I I I I M I I I I I I I I I I I I I I I I I M :: I I I I I : : 
o r f 6 2 - 1 S LE P WGVLLAVL I LGEHLS PVS ALGVFWI AAT LVAGRL SHQKX 

250 260 270 280 

Furthermore, ORF62ng shows significant homology to a hypothetical H.influenzae protein: 

sp I Q57147 | Y97 6_HAEIN HYPOTHETICAL PROTEIN HI0976 >gi I 1074589 I pir | [ B64163 
hypothetical protein HI0976 - Haemophilus influenzae (strain Rd KW20) 
>gi I 1574004 (U32778) hypothetical [Haemophilus influenzae] Length = 128 

Score = 106 bits (262), Expect = 2e-22 

Identities = 56/114 (49%), Positives = 68/114 (59%) 

Query: 1 MF YQ I LAL I IWG 3 S FI AAKYVYGGI D PALMVGVRXXXXXXXXXXXCRRHVGK I PREEWKP 60 

M YQILAL+IW SS I K Y +DP L+V VR R KI + K 

Sbjct: 1 MLYQILALLIWSSSLIVGKLTYSMMDPVLWQVRLIIAMIIVMPLFLRRWKKIDKPMRKQ 60 

Query: 61 LL IVS FVNYVLTLLLQFVGLKYT S AASAS VI VGLE PLLMVFVGHFFFNDKARAY 114 

L ++F NY LLQF+GLKYTSA+SA ++GLEPLL+VFVGHFFF K + 
Sbjct: 61 LWWLAFFNYTAVFLLQFIGLKYTSAS SAVTMIGLEPLLWFVGHFFFKTKQNGF 114 



Based on this analysis, including the homology with the transmembrane protein of H.influenzae 
and the putative leader sequecne and several transmembrane domains in the gonococcal protein, 
it is predicted that these proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 



50 Example 30 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 249>: 

1 ATGCGCCGTT TTCTACCGAT CGCAGCCATA TGCGCmGwms TCCTGkkGTA 

51 sGGACTGACG GCGGCAACCG GCAGCACCAG TTCGCTGGCG GATTATTTCT 

101 GGTGGATTGT TGCGTTCAGC GCAATGCTGC TGCTGGTGTT GTCCGCCGTT 

55 151 TTGGCACGTT ATGTCATATT GCTGTTGAAA GACAGGCGCG ACGGCGTATT 

201 CGGTTCG£tA srTyGCCAAA gsGCCTgkks TGGG.ATGTT TACGCTGGTT 
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251 GCCGkACTGC CCGGCGTGTT TCTGTTCGGC TTTCCCGCAC AGTTCATCAA 

301 CGGCACGATT AATTCGTGGT TCGGCAACGA TACCCACGAG GCGCTTGAAC 

351 GCAGCCTCAA TTTGAGCAAG TCCGCATTGA ATTTGGCGGC AGACAACGCC 

4 01 CTCGGCAACG CCGTCCCCGT GCAGATAGAC CTCATCGGCG CGGCTTCCCT 

451 GCCCGGGGAT ATGGGCAGGG TGCTGGAACA TTACGCCGGC AGCGGTTTTG 

501 CCCAGCTTGC CCTGTACAAy ksCGCAAGCG GCAAAATCGA AAAAAGCATC 

551 AACCCGCACA AGCTCGATCA GCCGTTTCCA GGTAAGGCGC GTTGGGAaAa 

601 AATCCaACGG GCGGGTTCGG TCAGGGATTT GGAAAG CAT A GGCGGCGTAT 

651 TGTaCGCGCA GGGCTGGCTG TCGGCGGGTA CGCACwACGG GCGCGATTAC 

701 GCCTTGTTTT TCCGTCAGCC GGTTCCCAAA GGCGTGGCAG AGGATGCCGT 

751 yTTAATCGAA AAGGCAAGGG CGAAATATGC TGAGTTGAGT TACAGCAAAA 

801 AAGGTTTGCA GACCTTTTTC CTGGCAACCC TGCTGATTGC CTCGCTGCTG 

851 TCGATTTTTC TTGCACTGGT CATGGCACTG TATTTCGCCC GCCGTTTCGT 

901 CGAACCCGTC CTATCGCTTG CCGAGGGGGC GAAGGCGGTG GCGCAAGGCG 

951 ATTTCAGCCA GACGCGCCCC GTGTTGCGCA ACGACGAGTT CGGACGCTTG 

1001 ACCArGTTGT TCAACCACAT GACCGAGCAG CTTTCCATCG CCAAAGATGC 

1051 AGACGAGCGC AACCGCCGGC GCGAGGAAGC CGCCAGGCAT TATCTTGAAT 

1101 GCGTGTTGGA GGGGCTGACC ACGGGCGTGG TGGTGTTTGA CGAACAAGGC 

1151 TGTCTGAAAA CCTTCAACAA AGCGGCGGGT ACC.. 

This corresponds to the amino acid sequence <SEQ ID 250; ORF64>: 

1 MRRFLPIAAI CAXXLXXGLT AATGSTSSLA DYFWWIVAFS AMLLLVLSAV 

51 LARYVILLLK DRRDGVFGSX XAKXPXXXMF TLVAXLPGVF LFGFPAQFIN 

101 GTINSWFGND THEALERSLN LSKSALNLAA DNALGNAVPV QIDLIGAASL 

151 PGDMGRVLEH YAGSGFAQLA LYNXASGKIE KSINPHKLDQ PFPGKARWEK 

201 IQRAGSVRDL ESIGGVLYAQ GWLSAGTHXG RDYALFFRQP VPKGVAEDAV 

251 LIEKARAKYA ELSYSKKGLQ TFFLATLLIA SLLSIFLALV MALYFARRFV 

301 EPVLSLAEGA KAVAQGDFSQ TRPVLRNDEF GRLTXLFNHM TEQLSIAKDA 

351 DERNRRREEA ARHYLECVLE GLTTGVWFD EQGCLKTFNK AAGT . . 

Further work revealed the complete nucleotide sequence <SEQ ID 25 1>: 

1 ATGCGCCGTT TTCTACCGAT CGCAGC CAT A TGCGCCGTCG TCCTGTTGTA 

51 CGGACTGACG GCGGCAACCG GCAGCACCAG TTCGCTGGCG GATTATTTCT 

101 GGTGGATTGT TGCGTTCAGC GCAATGCTGC TGCTGGTGTT GTCCGCCGTT 

151 TTGGCACGTT ATGTCATATT GCTGTTGAAA GACAGGCGCG ACGGCGTATT 

201 CGGTTCGCAG ATTGCCAAAC GCCTTTCTGG GATGTTTACG CTGGTTGCCG 

251 TACTGCCCGG CGTGTTTCTG TTCGGCGTTT CCGCACAGTT CATCAACGGC 

301 ACGATTAATT CGTGGTTCGG CAACGATACC CACGAGGCGC TTGAACGCAG 

351 CCTCAATTTG AGCAAGTCCG CATTGAATTT GGCGGCAGAC AACGCCCTCG 

4 01 GCAACGCCGT CCCCGTGCAG ATAGACCTCA TCGGCGCGGC TTCCCTGCCC 

4 51 GGGGATATGG GCAGGGTGCT GGAAC AT T AC GCCGGCAGCG GTTTTGCCCA 

501 GCTTGCCCTG TACAATGCCG CAAGCGGCAA AATCGAAAAA AGCATCAACC 

551 CGCACAAGCT CGATCAGCCG TTTCCAGGTA AGGCGCGTTG GGAAAAAATC 

601 CAACGGGCGG GTTCGGTCAG GGATTTGGAA AGCATAGGCG GCGTATTGTA 

651 CGCGCAGGGC TGGCTGTCGG CGGGTACGCA CAACGGGCGC GATTACGCCT 

7 01 TGTTTTTCCG TCAGCCGGTT CCCAAAGGCG TGGCAGAGGA TGCCGTCTTA 

751 ATCGAAAAGG CAAGGGCGAA ATATGCTGAG TTGAGTTACA GCAAAAAAGG 

801 TTTGCAGACC TTTTTCCTGG CAACCCTGCT GATTGCCTCG CTGCTGTCGA 

851 TTTTTCTTGC ACTGGTCATG GCACTGTATT TCGCCCGCCG TTTCGTCGAA 

901 CCCGTCCTAT CGCTTGCCGA GGGGGCGAAG GCGGTGGCGC AAGGCGATTT 

951 CAGCCAGACG CGCCCCGTGT TGCGCAACGA CGAGTTCGGA CGCTTGACCA 

1001 AGTTGTTCAA CCACATGACC GAGCAGCTTT CCATCGCCAA AGAAGCAGAC 

1051 GAGCGCAACC GCCGGCGCGA GGAAGCCGCC AGGCATTATC TTGAATGCGT 

1101 GTTGGAGGGG CTGACCACGG GCGTGGTGGT GTTTGACGAA CAAGGCTGTC 

1151 TGAAAACCTT CAACAAAGCG GCGGAACAGA TTTTGGGGAT GCCGCTTACC 

1201 CCCCTGTGGG GCAGCAGCCG GCACGGTTGG CACGGCGTTT CGGCGCAGCA 

1251 GTCCCTGCTT GCCGAAGTGT TTGCCGCCAT CGGCGCGGCG GCAGGTACGG 

1301 ACAAACCGGT CCATGTGAAA TATGCCGCGC CGGACGATGC CAAAATCCTG 

1351 CTGGGCAAGG CAACCGTCCT GCCCGAAGAC AACGGCAACG GCGTGGTAAT 

1401 GGTGATTGAC GACATCACCG TTTTGATACA CGCGCAAAAA GAAGCCGCGT 

1451 GGGGCGAAGT GGCGAAGCGG CTGGCACACG AAATCCGCAA TCCGCTCACG 

1501 CCCATCCAGC TTTCCGCCGA ACGGCTGGCG TGGAAATTGG GCGGGAAGCT 

1551 GGATGAGCAG GATGCGCAAA TCCTGACGCG TTCGACCGAC ACCATCGTCA 

1601 AACAGGTGGC GGCATTGAAG GAAATGGTCG AAGCATTCCG CAATTATGCG 

1651 CGTTCCCCTT CGCTCAAATT GGAAAATCAG GATTTGAACG CCTTAATCGG 

17 01 CGATGTGTTG GCATTGTATG AAGCCGGTCC GTGCCGGTTT GCGGCGGAGC 

17 51 TTGCCGGCGA ACCGCTGACG GTGGCGGCGG ATACGACCGC CATGCGGCAG 

1801 GTGCTGCACA ATATTTTCAA AAATGCCGCC GAAGCGGCGG AAGAAGCCGA 
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1851 TGTGCCCGAA GTCAGGGTAA AATCGGAAAC AGGGCAGGAC GGTCGGATTG 

19 01 TCCTGACGGT TTGCGACAAC GGCAAAGGGT TCGGCAGGGA AATGCTGCAC 
1951 AACGCCTTCG AGCCGTATGT AACGGACAAA CCGGCGGGAA CGGGATTGGG 

20 01 TCTGCCTGTG GTGAAAAAAA TCATTGAAGA ACACGGCGGC CGCATCAGCC 
2051 TGAGCAATCA GGATGCGGGT GGCGCGTGTG T CAGAAT CAT CTTGCCAAAA 
2101 ACGGTAAAAA CTTATGCGTA G 

This corresponds to the amino acid sequence <SEQ ID 252; ORF64-l>: 



1 MRRFLPIAAI CAWLLYGLT AATGSTSSLA 



101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



LARYVILLL K 
TINSWFGNDT 
GDMGRVLEHY 
QRAGSVRDLE 
IEKARAKYAE 
PVLSLAEGAK 
ERNRRREEAA 
PLWGSSRHGW 
LGKATVLPED 
PIQLSAERLA 
RSPSLKLENQ 
VLHNIFKNAA 
NAFEPYVTDK 
TVKTYA* 



DRRDGVFGSQ 
HEALERS LNL 
AG S G FAQLAL 
S IGGVLYAQG 
LSYSKKGLQT 
AVAQGDFSQT 
RHYLECVLEG 
HGVSAQQSLL 
NGNGVVMVID 
WKLGGKLDEQ 
DLNALIGDVL 
EAAEEADVPE 
PAGTGLGLPV 



IAKRLS GMFT 
SKSALNLAAD 
YNAASGKIEK 
WLSAGTHNGR 
FFLAT LLIAS 
RPVLRNDEFG 
LTTGVWFDE 
AEVFAAI GAA 
DITVLIHAQK 
DAQILTRSTD 
ALYEAGPCRF 
VRVKSETGQD 
VKKI IEEHGG 



DYFWWIVAFS 
LVAVLPGVFL 



NALGNAVPVQ 
SINPHKLDQP 
DYALFFRQPV 
LLSIFLALVM 



RLTKLFNHMT 
QGCLKT FNKA 
AGTDKPVHVK 
EAAWGEVAKR 
TIVKQVAALK 
AAELAGEPLT 
GRIVLTVCDN 
RISLSNQDAG 



AM LLLVLSAV 
FGV SAQFING 
IDLIGAASLP 
FPGKARWEKI 
PKGVAE DAVL 
ALYFARRFVE 
EQLSIAKEAD 
AEQILGMPLT 
YAAPDDAKIL 
LAHEIRNPLT 
EMVEAFRNYA 
VAADTTAMRQ 
GKGFGREMLH 
GACVRIILPK 



Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF64 shows 92.6% identity over a 392aa overlap with an ORF (ORF64a) from strain A of TV. 
meningitidis: 

10 20 30 40 50 60 

orf64.pep MRRFLP I AAICAXXLXXGLTAATGS T S SLA DYFWWI VAFS AM LLLVLSAVLARYVI LLL K 

I M I I I I II M I I I I I I II I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I M I I I I II I 
orf64a MRRFLP IAAICAWLLYGLTAATGSTSSLA DYFWW1VAFS AM LLLVLSAVLARYVI LLL K 



10 



20 



30 



40 



50 



60 



70 80 90 100 110 120 

DRRDGVFGSXXAKXPXX XMFTLVAXLPGVFL FG FPAQFINGTINSWFGNDTHEALERSLN 

DRRDGVFGSQIAKR-LS GMFTLVAVLPGVFLFGV SAQ FINGT INSWFGNDTHEALERS LN 
70 80 90 100 110 

130 140 150 160 170 180 

LSKSALNLAADNALGNAVPVQIDLIGAASLPGDMGRVLEHYAGSGFAQLALYNXASGKIE 
I I I I I I I I I I I I I I I I I : I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 
LSKSALNLAADNALGNAIPVQIDXIGAASLPXDMGRVLEHYAGSGFAQLALYNAASGKIE 
120 130 140 150 160 170 

190 200 210 220 230 240 

KSINPHKLDQPFPGKARWEKIQRAGSVRDLESIGGVLYAQGWLSAGTHXGRDYALFFRQP 



250 260 270 280 290 300 

VPKGVAEDAVLIEKAPvAKYAELSYSKKGLQT FFLAT LLIAS LLS I FLALVMALY FARRFV 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 

VPKGVAEDAVLIEKARAXXXXLSYSKKGLQT FFLAT LLIAS LLSI FLALVMALY FARRFV 
240 250 260 270 280 290 

310 320 330 340 350 360 

EPVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTXLFNHMTEQLSIAKDADERNRRREEA 
I I I I I I I I II I II I I I I I I I I I I I I I II I I I I I ! I I I I I I I I I 1 I I I : I I I I I I I I I I I 
EPVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLSIAKEADERNRRREEA 
300 310 320 330 340 350 
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370 380 390 

ARHYLECVLEGLTTGVWFDEQGCLKTFNKAAGT 
I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 

ARHYLECVLEGLTTGVWFDEQGCLKTFNKAAEQILGMPLTPLWGSSRHGWHGVSAQQSL 
360 370 380 390 400 410 

LAEVFAAIGAAAGTDKPVHVKYAAPDDAKILLGKATVLPEDNXNGVVMVIDDITVLIHAQ 
420 430 440 450 460 470 

The complete length ORF 64a nucleotide sequence <SEQ ID 253> is: 

1 ATGCGCCGTT TTCTACCGAT CGCAGCCATA TGCGCCGTCG TCCTGTTGTA 

51 CGGACTGACG GCGGCAACCG GCAGCACCAG TTCGCTGGCG GATTATTTCT 

101 GGTGGATTGT TGCGTTCAGC GCAATGCTGC TGCTGGTGTT GTCCGCCGTT 

151 TTGGCACGTT ATGTCATATT GCTGTTGAAA GACAGGCGCG ACGGCGTATT 

201 CGGTTCGCAG ATTGCCAAAC GCCTTTCCGG GATGTTTACG CTGGTTGCCG 

251 TACTGCCCGG CGTGTTTCTG TTCGGCGTTT CCGCACAGTT TATCAACGGC 

301 ACGATTAATT CGTGGTTCGG CAACGATACC CACGAGGCGC TTGAACGCAG 

351 CCTCAATTTG AGCAAGTCCG CATTGAATCT GGCGGCAGAC AACGCCCTTG 

4 01 GCAACGCCAT CCCCGTGCAG ATAGACNTCA TCGGCGCGGC TTCCCTGCCC 

451 NGGGATATGG GCAGGGTGCT GGAACATTAC GCCGGCAGCG GTTTTGCCCA 

501 GCTTGCCCTG TACAATGCCG CAAGCGGCAA AATCGAAAAA AGCATCAACC 

551 CGCACAAGCT CGATCAGCCG TTTCCAGGTA AGGCGCGTTG GGAAAAAATC 

601 CAACAGGCGG GTTCGGTCAG GGATNNGGAA AGCATAGGCG GCGTATTGTA 

651 CGCGCANGGC TGGCTGTCGG CAGNNACGCA CAACGGGCGC GATTACGCCT 

701 TGTTTTTCCG TCAGCCGGTT CCCAAAGGCG TGGCAGAGGA TGCCGTCTTA 

751 AT C GAAAAGG CAAGGGCGNA ANANNNTNAG TTGAGTTACA GCAAAAAAGG 

801 TTTGCAGACC TTTTTCCTNG CAACCCTGCT GATTGCCTCN CTGCTGTCGA 

851 TTTTTCTTGC ACTGGTCATG GCACTGTATT TCGCCCGCCG TTTCGTCGAA 

901 CCCGTCCTAT CGCTTGCCGA GGGGGCGAAG GCGGTGGCGC AAGGCGATTT 

951 CAGCCAGACG CGCCCCGTGT TGCGCAACGA CGAGTTCGGA CGCTTGACCA 

1001 AGTTGTTCAA CCACATGACC GAGCAGCTTT CCATCGCCAA AGAAGCAGAC 

1051 GAGCGCAACC GCCGGCGCGA GGAAGCCGCC AGACATTATC TCGAATGCGT 

1101 GTTGGAGGGG CTGACCACGG GCGTGGTGGT GTTTGACGAA CAAGGCTGTC 

1151 TGAAAACCTT CAACAAAGCG GCGGAACAGA TTTTGGGGAT GCCGCTTACC 

1201 CCCCTGTGGG GCAGCAGCCG GCACGGTTGG CACGGCGTTT CGGCGCAGCA 

1251 GTCCCTGCTT GCCGAAGTGT TTGCCGCCAT CGGCGCGGCG GCAGGTACGG 

1301 ACAAACCGGT CCATGTGAAA TATGCCGCGC CGGACGATGC CAAAATCCTG 

1351 CTGGGCAAGG CAACCGTCCT GCCCGAAGAC AACNGCAACG GCGTGGTAAT 

1401 GGTGATTGAC GACATCACCG TTTTGATACA CGCGCAAAAA GAAGCCGCGT 

14 51 GGGGCGAAGT GGCAAAACGG CTGGCACACG AAATCCGCAA TCCGCTCACG 

1501 CCCATCCAGC TTTCTGCCGA ACGGCTGGCG TGGAAATTGG GCGGGAAGCT 

1551 GGACGAGCAN GACGCGCAAA TCCTGACACG TTCGACCGAC AC CAT CAT C A 

1601 AACAAGTGGC GGCATTAAAA GAAATGGTCG AGGCATTCCG CAATTACNCG 

1651 CGTTCCCCTT CGWCTCAATT GGAAAATCAG GATTTGAACG CCTTAATCGG 

1701 CGATGTGTTG GCATTGTACG AAGCTGGTCC GTGCCGGTTT GCGGCGGAAC 

1751 TTGCCGGCGA ACCGCTGATG ATGGCGGCGG ATACGACCGC CATGCGGCAG 

1801 GTGCTGCACA ATATTTTCAA AAATGCCGCC GAAGCGGCGG AAGAAGCCGA 

1851 TGTGCCCGAA GTCAGGGTAA AAT CGGAAGC GGGGCAGGAC GGACGGATTG 

1901 TCCTGACAGT TTGCGACAAC GGCAAGGGGT TCGGCAGGGA AATGCTGCAC 

1951 AATGCCTTCG AGCCGTATGT AACGGACAAA CCGGCTGGAA CGGGATTGNG 

2001 ACTGCCCGTG GTGAAAAAAA TCATTGAAGA ACACGGCGGC CNCATCAGCC 

2051 TGAGCAATCA GGATGCGGGC GGCGCGTNTG T CAGAAT CAT CTTGCCAAAA 

2101 ACGGTAGAAA CTTATGCGTA G 

This encodes a protein having amino acid sequence <SEQ ID 254>: 



orf 64 .pep 

orf 64a 



1 MRRFLPIAAI CAWLLYGLT AATGSTSSLA DYFWWIVAFS AMLLLVLSAV 

51 LARYVI LLL K DRRDGVFGSQ IAKRLS GMFT LVAVLPGVFL FGV SAQFING 

101 TINSWFGNDT HEALERSLNL SKSALNLAAD NALGNAIPVQ IDXIGAASLP 

151 XDMGRVLEHY AGSGFAQLAL YNAASGKIEK SINPHKLDQP FPGKARWEKI 

201 QQAGSVRDXE SIGGVLYAXG WLSAXTHNGR DYALFFRQPV PKGVAEDAVL 

251 IEKARAXXXX LSYSKKGLQT FFLAT LLIAS LLSIFLALVM ALY FARRFVE 

301 PVLSLAEGAK AVAQGDFSQT RPVLRNDEFG RLTKLFNHMT EQLSIAKEAD 

351 ERNRRREEAA RHYLECVLEG LTTGVWFDE QGCLKT FNKA AEQILGMPLT 

401 PLWGSSRHGW HGVSAQQSLL AEVFAAIGAA AGT DKPVHVK YAAPDDAKIL 

451 LGKATVLPED NXNGWMVID DITVLIHAQK EAAWGEVAKR LAHEIRNPLT 

501 PIQLSAERLA WKLGGKLDEX DAQILTRSTD TIIKQVAALK EMVEAFRNYX 

551 RSPSXQLENQ DLNALIGDVL ALYEAGPCRF AAELAGEPLM MAADTTAMRQ 

601 VLHNIFKNAA EAAEEADVPE VRVKSEAGQD GRIVLTVCDN GKGFGREMLH 



CHIR-0160 (356.001) 



-198- 



PATENT 



651 NAFEPYVTDK PAGTGLXLPV VKKIIEEHGG XISLSNQDAG GAXVRIILPK 
7 01 TVETYA* 

ORF64a and ORF64-1 show 96.6% identity in 706 aa overlap: 

10 20 30 40 50 60 

orf64a pep MRRFLPIAAICAWLLYGLTAATGSTSSLADYFWWIVAFSAMLLLVLSAVLARYVILLLK 

I I M M I I I I I I I ! I I I I i I M I M I I I I I I I I I I I I I M I I II II I ! M II I M I I I I I 

orf 64-1 MRRFL PIAAI CAWLLYGLTAATG ST S S LADYFWWIVAFSAMLLLVL S AVLARYVILLLK 

10 20 30 40 50 60 

70 80 90 100 110 120 
orf 64a pep DRRDGVFGSQIAKRLSGMFTLVAVLPGVFLFGVSAQFINGTINSWFGNDTHEALERSLNL 
M I I I I I I II I I 11 I I I I I II I II 1 I 1 M I I I I I I I I I 



130 140 150 160 170 180 

SKSALNLAADNALGNAIPVQIDXIGAASLPXDMGRVLEHYAGSGFAQLALYNAASGKIEK 
| | | | I I I I I I II I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
SKSALNLAADNALGNAVPVQIDLIGAASLPGDMGRVLEHYAGSGFAQLALYNAASGKIEK 

130 140 150 160 170 180 

190 200 210 220 230 240 

SINPHKLDQPFPGKARWEKIQQAGSVRDXESIGGVLYAXGWLSAXTHNGRDYALFFRQPV 
II | I I I I I I I I I M I I I I I I I = I I I I I I I I I! I II I I Mill I I II M I I! M I I I I 

SINPHKLDQPFPGKARWEKIQRAGSVRDLESIGGVLYAQGWLSAGTHNGRDYALFFRQPV 
190 200 210 220 230 240 

250 260 270 280 290 300 

PKGVAEDAVLIEKARAXXXXLSYSKKGLQTFFLATLLIASLLSIFLALVMALYFARRFVE 
I | M I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I II M I I I II II I I I I I 

PKGVAEDAVL IEKARAKYAEL S Y SKKGLQTFFLATLLI AS LLS I FLALVMAL YFARRFVE 

250 260 270 280 290 300 

310 320 330 340 350 360 

PVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLSIAKEADERNRRREEAA 

I || I M I I I I I I I 1 I I I I I I II I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I M 
PVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLSIAKEADERNRRREEAA 
310 320 330 340 350 360 

370 380 390 400 410 420 

RHYLECVLEGLTTGWVFDEQGCLKTFNKAAEQILGMPLTPLWGSSRHGWHGVSAQQSLL 
I I I I I I I I I I I I I I I I I I I i I I I II I I I I I II I I I I I I I I I II I I II II I I I I I I I I I II 
RHYLECVLEGLTTGVWFDEQGCLKTFNBCAAEQILGMPLTPLWGSSRHGWHGVSAQQSLL 

370 380 390 400 410 420 

430 440 450 460 470 480 

AEVFAAIGAAAGTDKPVHVKYAAPDDAKILLGKATVLPEDNXNGVVMVIDDITVLIHAQK 
I II I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I 
AEVFAAIGAAAGTDKPVHVKYAAPDDAKILLGKATVLPEDNGNGWMVIDDITVLIHAQK 

430 440 450 460 470 480 

490 500 510 520 530 540 

EAAWGEVAKRLAHEIRNPLTPIQLSAERLAWKLGGKLDEXDAQILTRSTDTIIKQVAALK 

I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I = I I I I I I I 
EAAWGEVAKRLAHEIRNPLTPIQLSAERLAWKLGGKLDEQDAQILTRSTDTIVKQVAALK 
490 500 510 520 530 540 

550 560 570 580 590 600 

EMVEAFRNYXRSPSXQLENQDLNALIGDVLALYEAGPCRFAAELAGEPLMMAADTTAMRQ 
I I I I I I I I I II I I : I II I I 1 I I I I I I I I I I I I I I I I I I I I I I 1 I I I I : I I I I I I I I I 
EMVEAFRNYARSPSLKLENQDLNALIGDVLALYEAGPCRFAAELAGEPLTVAADTTAMRQ 

550 560 570 580 590 600 

610 620 630 640 650 660 

VLHNIFKNAAEAAEEADVPEVRVKSEAGQDGRIVLTVCDNGKGFGREMLHNAFEPYVTDK 
I II II I I I I I II I I I I I I I I I 1 II I I : I I i I I I I I II I I I I I I I I I I I II I I I I I I I II I 
VLHNIFKNAAEAAEEADVPEVRVKSETGQDGRIVLTVCDNGKGFGREMLHNAFEPYVTDK 

610 620 630 640 650 660 



CHIR-0160 (356.001) 



-199- 



PATENT 



670 680 690 700 

orf64a pep pagtglxlpvvkkiieehggxislsnqdaggaxvriilpktvetyax 

Hllll | I I I I I I 1 I I I 1 I I I I I M I I I I i I I I I 1 1 I 1 I = I 1 I i 
5 orf64-l PAGTGLGLPVVKKIIEEHGGRISLSNQDAGGACVRIILPKTVKTYAX 

670 680 690 700 

Homology with a predicted QRF from N. gonorrhoeae 

ORF64 shows 86.6% identity over a 387aa overlap with a predicted ORF (ORF64.ng) from N. 
10 gonorrhoeae: 

orf64 pep MRRFLPIAAICAXXLXXGLTAATGSTSSLADYFWWIVAFSAMLLLVLSAVLARYVILLLK 60 

i I I I! I I I I I I I I I I I I I I I I I I I I M I I I I I I : I I II i I I ! I I I I I I I 

orf64ng mRRFLPIAAICAWLLYGLTAATGSTSSLADYFWWIVSFSAMLLLVLSAVLARYVILLLK 60 

15 orf64 pep DRRDGVFGSXXAKXPXXXMFTLVAXLPGVFLFGFPAQFINGTINSWFGNDTHEALERSLN 120 

|||:||||| II 1 I I I I I 111:1111: I I I I I I II I I I II I I I II I I I I I I I 

orf 64ng dRRNGVFGSQIAKR-LSGMFTLVAVLPGLFLFGISAQFINGTINSWFGNDTHEALERSLN 119 

orf64 pep LSKSALNLAADNALGNAVPVQIDLIGAASLPGDMGRVLEHYAGSGFAQLALYNXASGKIE 180 

20 I I I I I I : I I I I M : : I II M I I II M : I II I : I I I I I I I II I I II I I I I I I I 

orf 64ng lsksaldlaadnavsnavpvqidligtaslsgnmgsvlehyagsgfaqlalynaasgkie 17 9 

orf64 pep KSINPHKLDQPFPGKARWEKIQRAGSVRDLESIGGVLYAQGWLSAGTHXGRDYALFFRQP 240 
I I I I M : : | | | : I I : I I : II : : I I I I : I I I I I II I I I I I I I I I 1 I 1 I I I I I I I I I I 1 
25 orf64ng KSINPHQFDQPLPDKEHWEQIQQTGSVRSLESIGGVLYAQGWLSAGTHNGRDYALFFRQP 239 

orf 64 .pep VPKGVAEDAVLIEKARAKYAELSYSKKGLQTFFLATLLIASLLSIFLALVMALYFARRFV 300 

orf 64ng IPENVAQDAVLIEKARAKYAELSYSKKGLQTFFLVTLLIASLLSIFLALVMALYFARRFV 2 99 

30 

orf 64 .pep EPVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTXLFNHMTEQLSIAKDADERNRRREEA 360 

I I : I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I : I I I I I I I I I I I 

orf 64ng EPILSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLSIAKEADERNRRREEA 35 9 

35 orf 64. pep ARHYLECVLEGLTTGVWFDEQGCLKT FNKAAGT 394 

I I I I I I I I I : I I I I I I I I : I : I 

orf64ng ARHYLECVLDGLTTGVWSYPLSCCRTAVFSTCHSSPLSYF 400 

An ORF64ng nucleotide sequence <SEQ ID 25 5> was predicted to encode a protein having amino 
acid sequence <SEQ ID 256>: 

40 1 MRRFLPIAAI CAVVLLYGLT AATGSTSSLA DYFWWIVSFS AM LLLVLSAV 

51 LARYVILLLK DRRNGVFGSQ IAKRLS GMFT LVAVLPGLFL FGI SAQFING 

101 TINSWFGNDT HEALERSLNL SKSALDLAAD NAVSNAVPVQ IDLIGTASLS 

151 GNMGSVLEHY AGSGFAQLAL YNAASGKIEK SINPHQFDQP LPDKEHWEQI 

201 QQTGSVRSLE SIGGVLYAQG WLSAGTHNGR DYALFFRQPI PENVAQDAVL 

45 251 IEKARAKYAE LSYSKKGLQT FFLVT LLIAS LLSIFLALVM AL YFARRFVE 

301 PILSLAEGAK AVAQGDFSQT RPVLRNDEFG RLTKLFNHMT EQLSIAKEAD 

351 ERNRRREEAA RHYLECVLDG LTTGWVSYP LSCCRTAVFS TCHSSPLSYF* 

Further work revealed the complete gonococcal DNA sequence <SEQ ID 257>: 

1 ATGCGCCGCT TCCTACCGAT CGCAGCCATA TGCGCCGTCG TCCTGCTGTA 

50 51 CGGATTGACG GCGGCGACCG GCAGCACCAG TTCGCTGGCG GATTATTTCT 

101 GGTGGATAGT CTCGTTCAGC GCAATGCTGC TGCTGGTGTT GTCCGCCGTT 

151 TTGGCACGTT ATGTCATATT GCTGTTGAAA GACAGGCGCA ACGGCGTGTT 

201 CGGTTCGCAG ATTGCCAAAC GCCTTTCCGG GAT GT T C AC G CTGGTCGCCG 

251 TACTGCCCGG CTTGTTCCTG TTCGGCATTT CCGCGCAGTT TATCAACGGC 

55 301 ACGATTAATT CGTGGTTCGG CAACGACACC CACGAAGCCC TCGAACGCAG 

351 CCTTAATTTG AGCAAGTCCG CACTGGATTT GGCGGCAGAC AATGCCGTCA 

4 01 GCAACGCCGT TCCCGTACAG ATAGACCTCA TCGGCACCGC CTCCCTGTCG 

451 GGCAATATGG GCAGTGTGCT GGAACACTAC GCCGGCAGCG GTTTTGCCCA 

501 GCTTGCCCTG TACAATGCCG CAAGCGGGAA AATCGAAAAA AG CAT C AAT C 

60 551 CGCACCAATT CGACCAGCCG CTTCCCGACA AAGAACATT G GGAACAGATT 
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601 CAGCAGACCG GTTCGGTTCG GAGTTTGGAA AGCATAGGCG GCGTATTGTA 

651 CGCGCAGGGA TGGTTGTCGG CAGGTACGCA CAACGGGCGC GATTACGCGC 

7 01 TGTTCTTCCG CCAGCCGATT CCCGAAAATG TGGCACAGGA TGCCGTTCTG 
751 ATTGAAAAGG CGCGGGCGAA ATATGCCGAA TTGAGTTACA GCAAAAAAGG 

8 01 TTTGCAGACC TTTTTTCTGG TAACCCTGCT GATTGCCTCG CTGCTGTCGA 
8 51 TTTTTCTTGC GCTGGTAATG GCACTGTATT TTGCCCGCCG TTTCGTCGAA 
901 CCCATTCTGT CGCTTGCCGA GGGCGCAAAG GCGGTGGCGC AGGGTGATTT 
951 CAGCCAGACG CGCCCCGTAT TGCGCAACGA CGAGTTCGGA CGTTTGACCA 

10 01 AGCTGTTCAA CCATATGACC GAGCAGCTTT CCATCGCCAA AGAAGCAGAC 

1051 GAACGCAACC GCCGGCGCGA GGAAGCCGCC CGTCACTACC TCGAGTGCGT 

1101 GTTGGATGGG TTGACTACCG GTGTGGTGGT GTTTGACGAA AAAGGCCGTT 

1151 TGAAAACCTT CAACAAGGCG GCGGAACAGA TTTTGGGGAT GCCGCTCGCC 

12 01 CCCCTGTGGG GCAGCAGCCG GCACGGTTGG CACGGCGTTT CGGCGCAGCA 

1251 GTCCCTGCTT GCCGAAGTGT TtgccgccAT CGGTGCGGCG GCAGGTACGG 

1301 ACAAACCGGT CCAGGTGGAA TATGCCGCGC CGGACGATGC CAAAATCCTG 

1351 CTGGGCAAGG CGACGGTATT GCCCGAAGAC AACGGCAACG GCGTGGTGAT 

14 01 GGTGATTGAC GACATCACCG TGCTGATACG CGCGCAAAAA GAAGCCGCGT 

1451 GGGGTGAAGT GGCGAAGCGG CTGGCACACG AAATCCGCAA TCCGCTCACG 

1501 CCCATCCAGC TTTCCGCCGA ACGGCTGGCG TGGAAATTGG GCGGGAAGCT 

1551 GGACGATCAG GACGCGCAAA TCCTGACGCG TtcgACCGAC ACCATCATCA 

1601 AACAGgtggc gGCGTTAAAA GAAATGGTCG AGGCATTCCG CAATTACGCG 

1651 CGCGCCCCTT CGCTCAAACT GGAAAATCAG GATTTGAACG CCTTAATCGG 

17 01 CGATGTTTTG GCCCTGTACG AAGCCGGCCC GTGCCGGTTT GAGGCGGAAC 

1751 TTGCCGGCGA ACCGCTGATG ATGGCGGCGG ATACGACCGC CATGCGGCAG 

1801 GTGCTGCACA ATATTTTCAA AAATGCCGCC GAAGCGGCGG AAGAAGCCGA 

1851 TATGCCCGAA GTCAGGGTAA AATCGGAAAC GGGGCAGGAC GGACGGATTG 

1901 TCCTGACGGT TTGCGACAAC GGCAAGGGAT TCGGCAAGGA AATGCTGCAC 

1951 AATGCTTTCG AGCCGTATGT GACGGATAAG CCGGCGGGAA CGGGACTGGG 

2001 TCTGCCTGTA GTGAAAAAAA TCATTGGAGA ACACGGCGGC CGCATCAGCC 

2051 TGAGCAATCA GGATGCGGGT GGGGCGTGTG T C AGAAT CAT CTTGCCAAAA 

2101 ACGGTAGAAA CTTATGCGTA G 

This corresponds to the amino acid sequence <SEQ ID 258; ORF64ng-l>: 

1 MRRFLPIAAI CAVVLLYGLT AATGSTSSLA DYFWWIVSFS AM LLLVLSAV 

51 LARYVILLL K DRRNGVFGSQ IAKRLS GMFT LVAVLPGLFL FGI SAQFING 

101 TINSWFGNDT HEALERSLNL SKSALDLAAD NAVSNAVPVQ IDLIGTASLS 

151 GNMGSVLEHY AGSGFAQLAL YNAASGKIEK SINPHQFDQP LPDKEHWEQI 

201 QQTGSVRSLE SIGGVLYAQG WLSAGTHNGR DYALFFRQPI PENVAQDAVL 

251 IEKARAKYAE LSYSKKGLQT FFLVT LLIAS LLSIFLALVM AL YFARRFVE 

301 PILSLAEGAK AVAQGDFSQT RPVLRNDEFG RLTKLFNHMT EQLSIAKEAD 

351 ERNRRREEAA RHYLECVLDG LTTGVWFDE KGRLKT FNKA AEQILGMPLA 

401 PLWGSSRHGW HGVSAQQSLL AEVFAAIGAA AGTDKPVQVE YAAPDDAKIL 

451 LGKATVLPED NGNGVVMVID DITVLIRAQK EAAWGEVAKR LAHEIRNPLT 

501 PIQLSAERLA WKLGGKLDDQ DAQILTRSTD TIIKQVAALK EMVEAFRNYA 

551 RAPSLKLENQ DLNALIGDVL ALYEAGPCRF EAELAGEPLM MAADT TAMRQ 

601 VLHNIFKNAA EAAEEADMPE VRVKSETGQD GRIVLTVCDN GKGFGKEMLH 

651 NAFEPYVTDK PAGTGLGLPV VKKIIGEHGG RISLSNQDAG GACVRIILPK 

7 01 TVETYA* 

ORF64ng-l and ORF64-1 show 93.8% identity in 706 aa overlap: 

10 20 30 40 50 60 

MRRFLPIAAI CAVVLLYGLT AATGSTSSLADYFWWIV3F3AMLLLVLSAVLARYVILLLK 

[ I I I I I I I I I I I II I I I M I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I 1 

MRRFLP I AAI CAVVLLYGLTAATGS T S SLADYFWWIVAFSAMLLLVLS AVLARYVI LLLK 
10 20 30 40 50 60 

70 80 90 100 110 120 

DRRHGVFGSQIAKRLSGMFTLVAVLPGLFLFGISAQFINGTINSWFGNDTHEALERSLNL 
I I I : I I I I I I I I I I I I I 1 I I I I I I I I I : I I I I : I I I I I I i I I II I I I I I I 1 I I I I M I I I 
DRRDGVFGSQIAKRLSGMFTLVAVLPGVFLFGVSAQFINGTINSWFGNDTHEALERSLNL 
70 80 90 100 110 120 

130 140 150 160 170 180 

SKSALDLAADNAVSNAVPVQIDLIGTASLSGNMGSVLEHYAGSGFAQLAL YNAASGKIEK 
I I I I I : I I I I II : : I I I I I I I I I I I : I I I I : I I I I I I I I I I I I I I I 1 I I I I II II I I I 
SKSALNLAADNALGNAVPVQ I DLIGAASLPGDMGRVLEHYAGSGFAQLAL YNAASGKIEK 
130 140 150 160 170 180 



orf 64ng-l .pep 
orf 64-1 

orf 64ng-l .pep 
orf64-l 

orf 64ng-l .pep 
or£64-l 
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25 



40 



orf 64ng-l .pep 



orf 64ng-l.pep 



rf 64ng-l.pep 



rf 64ng-l .pep 



190 200 210 220 230 240 

SINPHQFDQPLPDKEHWEQIQQTGSVRSLESIGGVLYAQGWLSAGTHNGRDYALFFRQPI 
| | | | | :: I I I : I I : I 1 : I I I 1 I I : I I I I 1 I 1 I I I I I I I I I I I I I I 1 I I I I I I M I = 
SINPHKLDQPFPGKARWEKIQRAGSVRDLESIGGVLYAQGWLSAGTHNGRDYALFFRQPV 
190 200 210 220 230 240 

250 260 270 280 290 300 

PENVAQDAVLIEKARAKYAELSYSKKGLQTFFLVTLLIASLLSIFLALVMALYFARRFVE 
| :: | 1 :[ I I I I ] I ! I I I I I I I I i I I I I I I I I I I : I I 1 1 I I 1 M 1 1 M i I 1 1 1 I I I M I 1 I 
PKGVAEDAVLIEKARAKYAELSYSKKGLQTFFLATLLIASLLSIFLALVMALYFARRFVE 
250 260 270 280 290 300 

310 320 330 340 350 360 

PILSLAEGAKAVAQGDFSQTRPVLRNDE FGRLTKLFNHMTEQLSIAKEADERNRRREEAA 

| : I |] M! I I I I I I I ! I I I I I I I I I I I I I I I I II I M I I I I I I I I I I I I I I I I I I I I I I I 

PVLSLAEGAKAVAQGDFSQTRPVLRNDEFGRLTKLFNHMTEQLSIAKEADERNRRREEAA 
310 320 330 340 350 360 

370 380 390 400 410 420 

RHYLECVLDGLTTGVWFDEKGRLKTFNKAAEQILGMPLAPLWGSSRHGWHGVSAQQSLL 
| | | | M | I : I I I M I I I I I I : I ! I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I 
RHYLECVLEGLTTGWVFDEQGCLKTFNKAAEQILGMPLTPLWGSSRHGWHGVSAQQSLL 
370 380 390 400 410 420 

430 440 450 460 470 480 

AEVFAAIGAAAGTDKPVQVEYAAPDDAKILLGKATVLPEDNGNGWMVIDDITVLIRAQK 

I I I I I I I I I I I I : I : I I I I I I I I I I I I I I I I I I 1 I I I I I! I I i I I : I I I 

AEVFAAIGAAAGTDKPVHVKYAAPDDAKILLGKATVLPEDNGNGWMVIDDITVLIHAQK 
430 440 450 460 470 480 

490 500 510 520 530 540 

EAAWGEVAKRLAHE IRN PLTP IQLS AERLAWKLGGKLDDQDAQILTRST DT I IKQVAALK 

M | I I I I I I I I M I I I I I I I I I 1 I I I I M I I I I I I I I I : I I I I I I I I I I II I : I I I I I I I 

EAAWGEVAKRLAHE IRNPLTP I QLSAERLAWKLGGKLDEQDAQI LTRSTDT IVKQVAALK 
490 500 510 520 530 540 

550 560 570 580 590 600 

EMVE AFRNYARAP S LKLEN QD LNAL I GDVLALYE AGPCRFE AELAGE PLMMAADTT AMRQ 

EMVEAFRNYARSPSLKLENQDLNALIGDVLALYEAGPCRFAAELAGEPLTVAADTTAMRQ 
550 560 570 580 590 600 

610 620 630 640 650 660 

VLHN I FKNAAEAAEEADMPEVRVKSET GQDGRIVLTVCDNGKGFGKEMLHNAFE PYVT DK 
I I I I I I I I I I I I i I I I I : i I I I I I I I I I I I II I I I I I I I I II I I I : I I I I I I I I I I I I I i 
VLHN I FKNAAEAAEEADVPEVRVKS ETGQDGRI VLTVCDNGKGFGREMLHNAFE PYVTDK 
610 620 630 640 650 660 

670 680 690 700 

PAGTGLGLPWKKI IGEHGGRISLSNQDAGGACVRIILPKTVETYAX 
I I I I I I I M I I I I I I I I I I I I I I I I I I I 11 I I i I i I I I I I I : I I I I 
PAGTGLGLPWKKI IEEHGGRISLSNQDAGGACVRIILPKTVKTYAX 
670 680 690 700 

Furthermore, ORF64ng-l shows significant homology to a protein from A.caulinodans: 

sp | Q04850 |NTRY_AZOCA NITROGEN REGULATION PROTEIN NTRY >gi | 77479 | pir | IS18624 ] 

protein - Azorhizobium caulinodans >gi 138737 (X63841) NtrY gene product 
[Azorhizobium caulinodans] Length = 771 
Score = 218 bits (550), Expect = 7e-56 

Identities = 195/720 (27%), Positives = 320/720 (44%), Gaps = 58/720 (8%) 

Query: 7 IAAICAWLLYGLTAATGSTSSLADYFWWIXXXXXXXXXXXXXXXXRYVILLLKDRRNGV 66 

I+A+ ++L GLT + + + R++KRG 

Sbjct: 35 ISALATFLILMGLTPWPTHQWIS VLLVNAAAVL I L S AMVGRE I WRI AKARARGR 90 

Query: 67 FGSQIAKRLSGMFTLVAVLPGLFLFGISAQFINGTINSWFGNDTHEALERSLNLSKSALD 126 

+++ R+ G+F +V+V+P + + +++ ++ ++ WF T E + S++++++ + 
Sbjct: 91 AAARLHIRIVGLFAWSWPAILVAWASLTLDRGLDRWFSMRTQEIVASSVSVAQTYVR 150 



■rf 64ng-l .pep 



orf 64ng-l . pep 



orf 64ng-l .pep 
orf64-l 



orf 64ng-l .pep 
orf 64-1 
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Query: 


127 


Sbjct: 


151 


Query: 


185 


Sb j ct : 


201 


Query: 


234 


Sbjct: 


257 


Query: 


292 


Sbjct: 


317 


Query: 


351 


Sb j ct : 


377 


Query: 


411 


Sb j ct : 


435 


Query: 


468 


Sbjct: 


489 


Query: 


528 


Sbjct: 


548 


Query: 


588 


Sbjct: 


608 


Query: 


640 


Sbjct: 


665 



-LFFRQPIPENVAQDAVLIEKARAKYAELSYSKKGLQTFFLVTXXXXXXXXXXXXXVMA 2 91 
L++IV ++AYL+ G+Q F + + 



L F+ + V PI 



+ E VL G+ GV+ D + R+ N++AE++LG L+ 



HGVSAQQSLIAEVFXXXXXXXXTDKPVQVEYAAPDDAKILLGKATVLPEDNG NGWM 4 67 

V LL E + VQ D + + V E + +G V+ 

EWPETAGLLEEA EHARQRSVQGNITLTRDGRERVFAVRVTTEQSPEAEHGWW 4 88 

VIDDITVLIRAQKEAAWGEVAKRLAHEIRNPLTPIQLSAERLAWKLGGKLDDQDAQILTR 527 

+DDIT LI AQ+ +AW +VA+R+ AHE I +N PLT P I QL S AERL KG + QD +1 + 

TLDDITELISAQRTSAWADVARRIAHEIKNPLTPIQLSAERLKRKFGRHV-TQDREIFDQ 547 



MV+ F ++AR P 



+E + EPYVT + GTGLGL +V KI+ EHGG I L++ G GA +R+ L 
PQESRNRLLEPYVTTREKGTGLGLAIVGKIMEEHGGGIELNDAPEGRGAWIRLTL 7 24 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



45 Example 31 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 259>: 

1 ATGTACGCAT TTACCGCCGC ACAGCAACAG AAGGCACTCT TCCGGCTGGT 

51 GCTTTTTCAT ATCCTCATCA TCGCCGCCAG CAACTATCTG GTGCAGTTCC 

101 CTTTCCAAAT TTTCGGCATC CACACCACTT GGGGCGCATT TTCCTTTCCC 

50 151 TTCATCTTCC TTGCCACCGA CCTGACCGTC CGCATTTTCG GTTCTCACTT 

201 GGCACGGCGG ATTATCTTTT GGGTGATGTT CCCCGCCCTT TTGCTTTCCT 

251 ACGTCTTTTC CGTTTTGTTC CACAACGGCA GTTGGACAGG CTTGGGCGCG 

301 CTGTCCGAAT TCAACACCTT TGTCGGACGC ATCGCCTTAG CCAGCTTTGC 

351 CGCCTACGCG ATCGGACAAA TCCTTGATAT TTTTGTATTC AACAAATTAC 

55 4 01 GCCGTCTGAA AGCGTGGTGG ATTGCACCGA ACGCATCAAC CGTCATCGGG 

451 CACGCGTTGG ATACG . . . 

This corresponds to the amino acid sequence <SEQ ID 260; ORF66>: 



1 MYAFTAAQQQ KALFRLVLFH ILIIAASNYL VQFPFQIFGI HTTWGAFSFP 
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51 F I FLAT D LTV RIFGSHLARR IIFWVMFPAL LLSYVFSVLF HNGSWTGLGA 
101 LSEFNTFVGR I ALAS FAAYA IGQILDIFVF NKLRRLKAWW IAPNASTVIG 
151 HALDT... 

Further work revealed the complete nucleotide sequence <SEQ ID 261>: 

5 1 ATGTACGCAT TTACCGCCGC ACAGCAACAG AAGGCACTCT TCCGGCTGGT 

51 GCTTTTTCAT ATCCTCATCA TCGCCGCCAG CAACTATCTG GTGCAGTTCC 

101 CTTTCCAAAT TTTCGGCATC CACACCACTT GGGGCGCATT TTCCTTTCCC 

151 TTCATCTTCC TTGCCACCGA CCTGACCGTC CGCATTTTCG GTTCTCACTT 

201 GGCACGGCGG ATTATCTTTT GGGTGATGTT CCCCGCCCTT TTGCTTTCCT 

10 251 ACGTCTTTTC CGTTTTGTTC CACAACGGCA GTTGGACAGG CTTGGGCGCG 

301 CTGTCCGAAT TCAACACCTT TGTCGGACGC ATCGCCTTAG CCAGCTTTGC 

351 CGCCTACGCG AT CGGAC AAA TCCTTGATAT TTTTGTATTC AACAAATTAC 

4 01 GCCGTCTGAA AGCGTGGTGG ATTGCACCGA CCGCATCAAC CGTCATCGGC 

4 51 AACGCCTTGG ATACGCTGGT ATTTTTCGCC GTTGCCTTCT ACGCAAGCAG 

15 501 CGATGGATTT ATGGCGGCAA ACTGGCAGGG CATCGCTTTT GTCGATTACC 

551 TGTTCAAACT TACCGTCTGC ACCCTCTTCT TCCTGCCCGC CTACGGCGTG 

601 ATACTGAATC TGCTGACGAA AAAAC T G AC A ACCCTGCAAA CCAAACAGGC 

651 GCAAGACCGC CCCGCGCCCT CGCTGCAAAA TCCGTAA 

This corresponds to the amino acid sequence <SEQ ID 262; ORF66-l>: 

20 1 MYAFTAAQQQ KALFRLVLFH ILIIAASNYL VQFPFQIFGI HTTWGAFS FP 

51 FIFLATDLTV RIFGSHLARR IIFWVMFPAL LLSYVFS VLF HNGSWTGLGA 

101 LSEFNTFVGR I ALAS FAAYA IGQILDIFV F NKLRRLKAWW IAPTASTVIG 

151 NALDTLVFFA VAF YASSDGF MAANWQGIAF VDYLFKLT VC TLFFLPAYGV 

201 ILNLLTKKLT TLQTKQAQDR PAPSLQNP* 

25 Computer analysis of this amino acid sequence gave the following results: 

Homology with the hypothetical protein o221 of E. coli (accession number P37619) 
ORF66 and o221 protein show 67% aa identity in 155aa overlap: 

orf66 1 MYAFTAAQQQKALFRLVLFHI LI IAASNYLVQFPFQIFGIHTTWGAFSFP FIFLATDLTV 60 
M F+- Q+ KALF L LFH+L+I +SNYLVQ PIG HTTWGAFSFP FIFLATDLTV 
30 o221 1 MNVFSQTQRYKALFWLSLFHLLVITSSNYLVQLPVSILGFHTTWGAFSFPFIFLATDLTV 60 

orf66 61 RIFGSHLARRIIFWVMFPALLLSYVFSVLFHNGSWTGLGALSEFNTFVGRIALASFAAYA 120 

RIFG+ LARRIIF VM PALL+SYV S LF+ GSW G GAL+ FN FV RIA ASF AYA 
o221 61 RIFGAPLARRIIFAVMIPALLISYVISSLFYMGSWQGFGALAHFNLFVARIATASFMAYA 120 

35 

orf66 121 IGQILDIFVFNKLRRLKAWWIAPNASTVIGHALDT 155 

+GQILD+ VFN+LR+ + WW+AP AST+ G+ DT 
o221 121 LGQILDVHVFNRLRQSRRWWLAPTASTLFGNVSDT 155 

40 Homology with a predicted QRF from N. meningitidis (strain A) 

ORF66 shows 96.1% identity over a 155aa overlap with an ORF (ORF66a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf6 6.pep MYAFTAAQQQKALFRLVLFHI LI I AASNYLVQFPFQI FG IHTTWGAFS FP FI FLAT DLTV 
45 I I I I I I I I I I I I I I I I I li I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I 

orf66a MYAFTAAQQQKALFWLVLFHILIIAASNYLVQFPFQISGIHTTWGAFS FPFIFLATDLTV 

10 20 30 40 50 60 

70 80 90 100 110 120 

50 orf 66. pep RIFGSHLARR IIFWVMFPALLLSYVFS VL FHNGSWTGLGALSEFNTFVGRI ALAS FAAYA 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I If I M I I I I I I 
orf 66a RIFGSHLARR IIFWVMFPALLLSYVFS VL FHNGSWTGLGALSEFNTFVGRI ALASFAAYA 

70 80 90 100 110 120 

55 130 140 150 

orf 66. pep IGQILDIF VFNKLRRLKAWWIAPNAS TVIGHALDT 
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orf 66a VDYLFKLT VCGLFFLPAYGVILNLL TKKLTTLQTKQAQDRPAPSLQNPX 

5 190 200 210 220 

The complete length ORF66a nucleotide sequence <SEQ ID 263> is: 

1 ATGTACGCAT TTACCGCCGC ACAGCAACAG AAGGCACTCT TCTGGCTGGT 

51 GCTTTTTCAT ATCCTCATCA TCGCCGCCAG CAACTATCTG GTGCAGTTCC 

101 CCTTCCAAAT TTCCGGCATC CACACCACTT GGGGCGCGTT TTCCTTTCCC 

10 151 TTCATCTTCC TCGCCACCGA CCTGACCGTC CGCATTTTCG GTTCGCACTT 

201 GGCACGGCGG ATTATCTTTT GGGTCATGTT CCCCGCCCTT TTGCTTTCCT 

251 ACGTCTTTTC CGTTTTGTTC CACAACGGCA GTTGGACGGG CTTGGGCGCG 

301 CTGTCCGAAT TCAACACCTT TGTCGGACGC ATCGCGCTGG CAAGTTTTGC 

351 CGCCTACGCG CTCGGACAAA TCCTTGATAT TTTTGTGTTC AACAAATTAC 

15 401 GCCGTCTGAA AGCGTGGTGG GTTGCCCCGA CTGCATCAAC CGTCATCGGC 

451 AACGCCTTAG ATACGTTGGT ATTTTTCGCC GTTGCCTTCT ACGCAAGCAG 

501 CGATGGATTT ATGGCGGCAA ACTGGCAGGG CATCGCTTTT GTCGATTACC 

551 TGTTCAAACT CACCGTCTGC GGTCTGTTTT TCCTGCCCGC CTACGGCGTG 

601 ATTCTGAATC TGCTGACGAA AAAACTGACG ACCCTGCAAA CCAAACAGGC 

20 651 GCAAGACCGC CCCGCGCCCT CGCTGCAAAA TCCGTAA 

This encodes a protein having amino acid sequence <SEQ ID 264>: 

1 MYAFTAAQQQ KALFWLVLFH ILI IAASNYL VQFPFQISGI HTTWGAFS FP 

51 FIFLATDLTV RIFGSHLARR IIFWVMFFAL LLSYVFS VLF HNGSWTGLGA 

101 LSEFNTFVGR I ALASFAAYA LGQILDIFV F NKLRRLKAWW VAPTAS TVIG 

25 151 NALDTLVFFA VAF YASSDGF MAANWQGIAF VDYLFKLT VC GLFFLPAYGV 

2 01 ILNLLTKKLT TLQTKQAQDR PAPSLQNP* 

ORF66a and ORF66-1 show 97.8% identity in 228 aa overlap: 

10 20 30 40 50 60 

orf 6 6a. pep MYAFTAAQQQKALFWLVLFHILIIAASNYLVQFPFQISGIHTTWGAFSFPFIFLATDLTV 
30 I I I I I I I I I I I I I I I I I I I I I M I I I i II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf 6 6-1 MYAFTAAQQQKALFRLVLFHILIIAASNYLVQFPFQIFGIHTTWGAFSFPFIFLATDLTV 

10 20 30 40 50 60 

70 80 90 100 110 120 

35 orf 66a . pep RIFGSHLARRIIFWVMFPALLLSYVFSVLFHNGSWTGLGALSEFNTFVGRIALASFAAYA 

I I I I I I I I I I I I I I I I I I I 1 I 1 I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 66-1 RIFGSHLARRIIFWVMFPALLLSYVFSVLFHNGSWTGLGALSEFNTFVGRIALASFAAYA 
70 80 90 100 110 120 

40 130 140 150 160 170 180 

orf 66a . pep LGQILDIFVFNKLRRLKAWWVAPTASTVIGNALDTLVFFAVAFYASSDGFMAANWQGIAF 
: M M ! I I I I I I I I I I i I I I : I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I 

orf 6 6-1 IGQI LD I FVFNKLRRLKAWWIAPT ASTVI GNALDT LVFFAVAFYASS DGFMAANWQGI AF 

130 140 150 160 170 180 

45 

190 200 210 220 229 

orf 66a . pep VDYLFKLTVCGLFFLPAYGVILNLLTKKLTTLQTKQAQDRPAPSLQNPX 



Homology with a predicted ORF from N.gonorrhoeae 

ORF66shows 94.2% identity over a 155aa overlap with a predicted ORF (ORF66.ng) from N. 
gonorrhoeae: 

orf 66 -pep MYAFTAAQQQKALFRLVL FHI LI I AASNYLVQFPFQI FG IHTTWGAFS FPFI FLAT DLTV 60 

M I : I I I I I I I I I I I 1 I i I I I I I I I I I I I I I I I I I : I I | | | | | | | | | | | | | M | | | | | | | 

orf 66ng MYALTAAQQQKALFRLVLFHI LI IAASNYLVQFPFRIFG IHTTWGAFS FPFI FLAT DLTV 60 

orf 66. pep RIFGSHLARRIIFWVMFPALLLSYVFSVLFHNGSWTGLGALSEFNTFVGRIALASFAAYA 12 0 
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orf66ng Ri FG SHLARRIIFWVMFPALSLSYVFSVLFHNGSWTGLGAPSQFNTFVGRIALASFAAYA 120 

orf6 6 pep IGQILDIFVFNKLRRLKAWWIAPNASTVIGHALDT 155 

: | I I I I I I I I : I I I I I I I II!III:MM 

5 orf66ng lGQILDIFVFDKLRRLKAWWIAPAASTVIGNALDTLVFFAVAFYASSDEFMAANWQGIAF 180 

The complete length ORF66ng nucleotide sequence <SEQ ID 265> is: 

1 ATGTACGCAT tgaccgccgc acagcaacag aaggcactct tccggctggt 
51 gcttttccat atcctcatca tcgccgccag caactatctg gtgcagttcc 

101 CCTTCCGGAT TTTCGGCATC CACACCACTT GGGGCGCGTT TTCCTTTCCC 

10 151 TTCATCTTCC TCGCCACCGA CCTGACCGTC CGCATTTTCG GTTCGCACTT 

201 GGCGCGGCGG ATTATCTTTT GGGTGATGTT CCCCGCCCTT ttgCTTTcat 

2 51 aCGTCTTTTC CGTTTTGTTC CACAACGGCA GTTGGACGGG CTTGGGCGCG 

301 ctgTCCCAAT TCAACACCTT TGTCGGACGC ATCGCGCTGG CAAGTTTTGC 

351 CGCCTACGCG CTCGGACAAA TCCTTGATAT TTTCGTATTC GACAAATTAC 

15 4 01 GCCGTCTGAA AGCGTGGTGG ATTGCCCCGG CCGCATCAAC CGTCATCGGC 

451 AATGCACTGG ACACGTTAGT ATTTTTTGCC GTTGCCTTTT ACGCAAGCAG 

501 CGATGAATTT ATGGCGGCAA ACTGGCAGGG CATCGCTTTT GTCGATTACC 

551 TGTTCAAACT TACCGTCTGC ACCCTCTTCT TCCTGCCCGC CTACGGCGTG 

601 ATACTGAATC TGCTGACGAA AAAACTGACG GCCCTGCAAA CCAAACAGGC 

20 651 GCAAGACCGC CCCGTGCCCT CGCTGCAAAA TCCGTAA 

This encodes a protein having amino acid sequence <SEQ ID 266>: 

1 MYALTAAQQQ KAL FRLVLFH ILIIAASNYL VQFPFRIFGI HTTW GAFSFP 

51 FIFLATDLTV R IFGSHLARR IIFWVMFPAL SLSYVFSVLF HNG SWTGLGA 

101 PSQ FNTFVGR IALASFAAYA LGQILDIFVF DKLRRLKAWW IAPA ASTVIG 

25 151 NALDTLVFFA VA FYASSDEF MAANWQGIA F VDYLFKLTVC T LFFLPAYGV 

201 ILNLLTKKLT ALQTKQAQDR PVPSLQNP* 

An alternative annotated sequence is: 

1 MYALTAAQQQ KALFRLVLFH ILIIAASNYL VQFPFRIFGI HTTWGAFS FP 

51 FIFLATDLTV RIFGSHLARR IIFWVMFPAL LLSYVFS VLF HNGSWTGLGA 

30 101 LSQFNTFVGR I ALASFAAYA LGQILDIFV F DKLRRLKAWW IAPAAS TVIG 

151 NALDTLVFFA VAF YASSDEF MAANWQGIAF VDYLFKLT VC TLFFLPAYGV 

201 ILNLL TKKLT ALQTKQAQDR PVPSLQNP* 

ORF66ng and ORF66-1 show 96.1% identity in 228 aa overlap: 

orf 66-1 .pep MYAFTAAQQQKALFRLVLFHILIIAASNYLVQFPFQIFGIHTTWGAFSFPFIFLATDLTV 60 
35 I I I : I I I I I I I I I I I I M I I I I I I I I I I I ! I I I I I : 1 I I I M I I I I I I I I I I I I I I I I I i 

or f 6 6ng MYALT AAQQQKALFRLVLFHI LI IAASNYLVQFPFRI FGI HTTWGAFS FPFI FLAT DLTV 6 0 

orf 66-1 . pep RIFGSHLARRIIFWVMFPALLLSYVFSVLFHNGSWTGLGALSEFNTFVGRIALASFAAYA 120 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I i 1 I I 
40 orf66ng RIFGSHLARRIIFWVMFPALLLSYVFSVLFHNGSWTGLGALSQFNTFVGRIALASFAAYA 120 

orf 66-1 .pep IGQILDIFVFNKLRRLKAWWIAPTASTVIGNALDTLVFFAVAFYASSDGFMAANWQGIAF 180 

: I I : I I I I I I I I I I I i : I I I ! I I I I I I I I I I I ! 1 I [ I I I I I I I I 

orf 66ng LGQILDIFVFDKLRRLKAWWIAPAASTVIGNALDTLVFFAVAFYASSDEFMAANWQGIAF 180 

45 

orf 66-1 -pep VDYLFKLTVCTLFFLPAYGVILNLLTKKLTTLQTKQAQDRPAPSLQNPX 229 

I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I : II I I I I I I I I : I I I I I I I 
orf66ng VDYLFKLTVCTLFFLPAYGVILNLLTKKLTALQTKQAQDRPVPSLQNPX 22 9 

Furthermore, ORF66ng shows significant homology with an E.coli ORF: 

50 sp|P37 619|YHHQ_ECOLI HYPOTHETICAL 25.3 KD PROTEIN IN FTSY-NIKA INTERGENIC 

REGION (0221) 

>gi | 1073495 Ipir | | S47690 hypothetical protein o221 - Escherichia coli >gi 1466607 
(U00039) No definition line found [Escherichia coli] >gi 11789882 (AE000423) 
hypothetical 25.3 kD protein in ftsY-nikA intergenic region [Escherichia coli] 
55 Length =221 

Score = 273 bits (692), Expect = 5e-73 

Identities = 132/203 (65%), Positives = 155/203 (76%) 



Query: 1 MYALTAAQQQKALFRLVLFHILIIAASNYLVQFPFRIFGIHTTWGAFSFPFIFLATDLTV 60 
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M + Q+ KALF L LFH+L+I +SNYLVQ PIG HTTWGAFSFPFIFLATDLTV 



Sbjct: 


1 


MNVFSQTQRYKALFWLSLFHLLVITSSNYLVQLPVSILGFHTTWGAFSFPFIFLATDLTV 


60 


Query: 


61 


RIFGSHLARRIIFWVMFPALLLSYVFSVLFHNGSWTGLGALSQFNTFVGRIALASFAAYA 


120 






RIFG+ LARRIIF VM PALL+SYV S LF+ GSW G GAL+ FN FV RIA ASF AYA 




Sbjct: 


61 


RI FGAPLARRI I FAVMI PALLI S YVI SSLFYMGSWQGFGALAHFNLFVARIATAS FMAYA 


120 


Query: 


121 


LGQILDIFVFDKLRRLKAWWIAPAASTVIGNALDTLVFFAVAFYASSDEFMAANWQGIAF 


180 






LGQILD+ VF++LR+ + WW+AP AST+ GN DTL FF +AF+ S D FMA +W IA 




Sbjct: 


121 


LGQILDVHVFNRLRQSRRWWLAPTASTLFGNVSDTLAFFFIAFWRSPDAFMAEHWMEIAL 


180 


Query: 


181 


VDYLFKLTVCTLFFLPAYGVILN 203 








VDY FK+ + +FFLP YGV+LN 




Sbjct: 


181 


VDYCFKVLISIVFFLPMYGVLLN 203 





Based on this analysis, including the homology with the E.coli protein and the presence of several 
putative transmembrane domains in the gonococcal protein, it is predicted that these proteins from 
N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 



Example 32 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 267>: 



1 


ATGGTCATAA 


AATATACAAA 


TTTGAATTTT 


GCGAAATTGT 


CGATAATTGC 


51 


AATTTTGATG 


ATGTATTCGT 


TTGAAGCGAA 


TGCAAAyGCA GTmwrAATAT 


101 


CTGAAACTGT 


TTCAGTTGAT 


ACCGGACAAG 


GTGCGAAAAT 


TCATAAGTTT 


151 


GTACCTAAAA 


ATAGTAAAAC 


TTATTCATCT 


GATTTAATAA 


AAACGGTAGA 


201 


TTTAACACAC 


AyyCCTACGG 


GCGCAAAAGC 


CCGAATCAAC 


GCCAAAATAA 


251 


CCGCCAGCGT 


ATCCCGCGCC 


GGCGTATTGG 


CGGGGGTCGG 


CAAACTTGCC 


301 


CGCTTAGgCG 


CGAAATTCAG 


CACAAGGGCG 


GTtCCCTATG 


TCGGAACAGC 


351 


CcTTTTAGCC 


CACGACGTAT 


ACGAAAcTTT 


CAAAGAAGAC 


AT ACAGG CAC 


401 


GAGGCTACCA 


ATACGACCCC 


GAAACCGACA 


AATTTGTAAA 


AGGCTACGAA 


451 


TATAGTAATT 


GCCTTTGGTA 


CGAAGACAAA 


AGACGTATTA 


ATAGAACCTA 


501 


TGGCTGCTAC 


GGCGTTGAT . . 









This corresponds to the amino acid sequence <SEQ ID 268; ORF72>: 

1 MVIKYTNLNF AKLSIIAILM MYSFEANANA VXISETVSVD TGQGAKIHKF 

51 VPKNSKTYSS DLIKTVDLTH XPTGAKARIN AKITASVSRA GVLAGVGKLA 

101 RLGAKFSTRA VPYVGTALLA HDVYETFKED IQARGYQYDP ETDKFVKGYE 

151 YSNCLWYEDK RRINRTYGCY GVD . . 

Further work revealed the complete nucleotide sequence <SEQ ID 269>: 

1 ATGGTCATAA AATATACAAA TTTGAATTTT GCGAAATTGT CGATAATTGC 

51 AATTTTGATG ATGTATTCGT TTGAAGCGAA TGCAAATGCA GTAAAAATAT 

101 CTGAAACTGT TTCAGTTGAT ACCGGACAAG GTGCGAAAAT TCATAAGTTT 

151 GTACCTAAAA ATAGTAAAAC TTATTCATCT GATTTAATAA AAACGGTAGA 

201 TTTAACACAC ATCCCTACGG GCGCAAAAGC CCGAATCAAC GCCAAAATAA 

251 CCGCCAGCGT ATCCCGCGCC GGCGTATTGG CGGGGGTCGG CAAACTTGCC 

301 CGCTTAGGCG CGAAATTCAG CACAAGGGCG GTTCCCTATG TCGGAACAGC 

351 CCTTTTAGCC CACGACGTAT ACGAAACTTT CAAAGAAGAC ATACAGGCAC 

401 GAGGCTACCA ATACGACCCC GAAACCGACA AATTTGCAAA GGTCTCAGGC 

451 TAA 

This corresponds to the amino acid sequence <SEQ ID 270; ORF72-l>: 

1 MVIKYTNLNF AKLSIIAILM MYSFEANANA VKISETVSVD TGQGAKIHKF 

51 VPKNSKTYSS DLIKTVDLTH IPTGAKARIN AKITASVSRA GVLAGVGKLA 

101 RLGAKFSTRA VPYVGTALLA HDVYETFKED IQARGYQYDP ETDKFAKVSG 
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Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF72 shows 98.0% identity over a 147aa overlap with an ORF (ORF72a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf7 2 pep MVIK YTNLNFAKLSIIAILMMYSFEANAN AVXISETVSVDTGQGAKIHKFVPKNSKTYSS 
| | | | | | | | | M I I I I I II I I I I II I I M I I I I I M I I I I M I I I I I I I I II I I I I I I I I 
orf7 2a MVIKYTNLNFAKLSIIAILbMYSFEANA NAVKISETVSVDTGQGAKIHKFVPKNSKTYSS 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf72 pep DLIKTVDLTHXPTGAKARINAKITASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 
I || II I I I I I I I I I I I I I I I I M I I I I I I I I I I M II I I I I II I I I I I I I I I I I I I I I I 
orf72a DLIKTVDLTHIPTGAKARINAKITASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 

70 80 90 100 110 120 

130 140 150 160 170 

orf 72 . pep HDVYETFKEDIQARGYQYDPETDKFVKGYEYSNCLWYE DKRRINRTYGCYGVD 

I I I I I I I I I I I I II I I I I I II I I I I : 1 
orf 72a HDVYETFKEDIQARGYQYDPETDKFAKVSGX 

130 140 150 

The complete length ORF72a nucleotide sequence <SEQ ID 271> is: 

1 ATGGTCATAA AATATACAAA TTTGAATTTT GCGAAATTGT CGATAATTGC 

51 AATTTTGATG ATGTATTCGT TTGAAGCGAA TGCAAATGCA GTAAAAATAT 

101 CTGAAACTGT TTCAGTTGAT ACCGGACAAG GTGCGAAAAT TCATAAGTTT 

151 GTACCTAAAA ATAGTAAAAC TTATTCATCT GATTTAATAA AAACGGTAGA 

201 TTTAACACAC ATCCCTACGG GCGCAAAAGC CCGAAT CAAC GCCAAAATAA 

251 CCGCCAGCGT ATCCCGCGCC GGCGTATTGG CGGGGGTCGG CAAACTTGCC 

301 CGCTTAGGCG CGAAATT CAG CACAAGGGCG GTTCCCTATG TCGGAACAGC 

351 CCTTTTAGCC CACGACGTAT ACGAAACTTT CAAAGAAGAC ATACAGGCAC 

4 01 GAGGCTACCA ATACGACCCC GAAACCGACA AATTTGCAAA GGTCTCAGGC 

451 TAA 

This encodes a protein having amino acid sequence <SEQ ID 272>: 

1 MVIKYTNLNF AKLSIIAILM MYSFEANAN A VKISETVSVD TGQGAKIHKF 
51 VPKNSKTYSS DLIKTVDLTH IPTGAKARIN AKITASVSRA GVLAGVGKLA 
101 RLGAKFSTRA VPYVGTALLA HDVYETFKED IQARGYQYDP ETDKFAKVSG 
151 * 

ORF72a and ORF72-1 show 100.0% identity in 150 aa overlap: 

10 20 30 40 50 60 

orf 72a. pep MVIKYTNLNFAKLSIIAILMMYSFEANANAVKISETVSVDTGQGAKIHKFVPKNSKTYSS 
I I I I I I I I I I I I I I I I II I I I II I I I I I I I I I I I 1 I I I I II I I I I I I I I I I I I 1 I I I I I I 
orf 72-1 MVIKYTNLNFAKLSIIAILMMYSFEANANAVKISETVSVDTGQGAKIHKFVPKNSKTYSS 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 72a . pep DLIKTVDLTHIPTGAKARINAKITASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 
I I I I I I I I I I I I I I I 1 I M I II II I I I I I I II I I I I I I 1 I I I I I I I I I II II I I I I I I II 
orf 72-1 DLIKTVDLTHIPTGAKARINAKITASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 

70 80 90 100 110 120 
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Homology with a predicted ORF from ^.gonorrhoeae 

ORF72 shows 89% identity over a 173aa overlap with a predicted ORF (ORF72.ng) from N. 
gonorrhoeae: 

orf72 pep MVIKYTNLNFAKLSIIAILMMYSFEANANAVXISETVSVDTGQGAKIHKFVPKNSKTYSS 60 

|| | : | | | | | | I I I I I II I I I I I I I I I 1 M I I I I I : I I I I I I I I I : I I I I I I : I : ' ' ' 
orf7 2ng mvtkHTNLNFAKLSIIAILMMYSFEANANAVKISETLSVDTGQGAKVHKFVPKSSNIYSS 60 

orf7 2 pep DLIKTVDLTHXPTGAKARINAKITASVSRAGVLAGVGKLARLGAKFSTRAVPYVGTALLA 120 

II | : M I I I I I I I I I I I I I I I I I N I I II I I : I I I I I : I I I I I : M I I I I I I I I I I I 
or f 7 2ng DLTKAVDLTHIPTGAKARINAKITASVSRAGVLSGVGKLVRQGAKFGTRAVPYVGTALLA 12 0 

orf72 pep HDVYETFKEDIQARGYQYDPETDKFVKGYEYSNCLWYEDKRRINRTYGCYGVD 173 

| | | | || | I M : I I I I I I I I I I M I I : I I I ! I I I : I I I I I I M I I I I I 

orf7 2ng HDVYETFKEDIQARGCRYDPETDKFVKGYEYANCLWYEDERRINRTYGCYGVDSSIMRLM 180 

An ORF72ng nucleotide sequence <SEQ ID 273> was predicted to encode a protein having amino 
acid sequence <SEQ ID 274>: 

1 MVTKHTNLNF AKLSIIAILM MYSFEANAN A VKISETLSVD TGQGAKVHKF 

51 VPKSSNIYSS DLTKAVDLTH IPTGAKARIN AKITASVSRA GVLS GVGKLV 

101 RQGAKFGTRA VPYVGTALLA HDVYETFKED IQARGCRYDP ETDKFVKGYE 

151 YANCLWYEDE RRINRTYGCY GVDSSIMRLM PDRSRFPEVK QLMESQMYRL 

201 ARPFWNWRKE ELNKLSSLDW NNFVLNRCTF DWNGGGCAVN KGDDFRAGAS 

251 FSLGRNPKYK EEMDAKKPEE ILSLKVDADP DKYIEATGYP GYSEKVEVAP 

301 GTKVNMGPVT DRNGN PVQVA ATFGRDAQGN TTADVQVIPR PDLTPASAEA 

351 PHAQPLPEVS PAENPANNPD PDENPGTRPN PEPDPDLNPD ANPDTDGQPG 

401 TSPDSPAVPD RPNGRHRKER KEGEDGGLSC DYFPEILACQ EMGKPSDRMF 

451 HDISIPQVTD DKTWSSHNFL PSNGVCPQPK TFHVFGRQYR ASYEPLCVFA 

501 EKIR FAVLLA FIIMSAFWF G SLGGE* 

After further analysis, the following gonococcal DNA sequence <SEQ ID 275> was identified: 

1 ATGGTCACAA AACATACAAA TTTGAATTTT GCGAAATTGT CGATAATTGC 

51 AATTTTGATG ATGTATTCGT TTGAAGCGAA TGCAAATGCA GTAAAAATAT 

101 CTGAAACTCT TTCGGTTGAT ACCGGACAAG GCGCGAAAGT TCATAAGTTC 

151 GTTCCTAAAT CAAGTAATAT TTATTCATCT GATTTAACAA AAGCGGTAGA 

201 TTTAACGCAT ATCCCCACGG GCGCAAAAGC CCGAATCAAC GCCAAAATAA 

251 CCGCCAGCGT ATCCCGCGCC GGCGTATTGT CGGGGGTCGG CAAACTTGTC 

301 CGCCAAGGCG CGAAATTCGG CACAAGGGCG GTTCCCTATG TCGGAACAGC 

351 CCTTTTAGCC CACGACGTAT ACGAAACTTT CAAAGAAGAC ATACAGGCAC 

401 GAGGCTGCCG ATACGATCCC GAAACCGACA AATTT 

This corresponds to the amino acid sequence <SEQ ID 276; ORF72ng-l>: 

1 MVTKHTNLNF AKLSIIAILM MYSFEANAN A VKISETLSVD TGQGAKVHKF 
51 VPKSSNIYSS DLTKAVDLTH IPTGAKARIN AKITASVSRA GVLSGVGKLV 
101 RQGAKFGTRA VPYVGTALLA HDVYETFKED IQARGCRYDP ETDKF 



ORF72ng-l and ORF721-1 show 89.7% identity in 145 aa overlap: 



orf7 2ng-l.pe MVTKHTNLNFAKLSIIAILMMYSFEANANAVKISETLSVDTGQGAKVHKFVPKSSNIYSS 

II 1:11111111 I I I I I I I I I I I I I I I I I I : : : : III 

orf72-l MVIKYTNLNFAKLSIIAILMMYSFEANANAVKISETVSVDTGQGAKIHKFVPKNSKTYSS 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf 7 2ng-l . pe DLTKAVDLTHIPTGAKARINAKITASVSRAGVLSGVGKLVRQGAKFGTRAVPYVGTALLA 



55 130 140 

orf 7 2ng-l .pe HDVYETFKEDIQARGCRYDPETDKF 



CHIR-0160 (356.001) 



-209- 



PATENT 



O r f 7 2 - 1 HDVYET FKED I QARGYQYD PET DKFAKVS GX 

130 140 150 

Based on this analysis, including the presence of a putative leader sequence and transmembrane 
domains in the gonococcal protein, it is predicted that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

Example 33 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 277>: 

1 AT GAG AT T T T TCGGTATCGG TTTTTTGGTG CTGCTGTTTT TGGAGATTAT 

51 GTCGATTGTG TGGGTTGCCG ATTGGCTGGG CGGCGGCTGG ACGTTGTTTT 

101 TGATGGCGGC AGGTTTTGCC GCGGGCGTGC TGATGCTCAG GCAAACCGGG 

151 gCTGACCGGT CTTTTATTGG CGGGCGCGGC AATGAGAAGC GGCGGGAAGG 

201 TATCCGTTTA TCAGATGTTG TGGCCTATC. . 

This corresponds to the amino acid sequence <SEQ ID 278; ORF73>: 

1 MRFFGIGFLV LLFLEIMSIV WVADWLGGGW TLFLMAAGFA AGVLMLRQTG 
51 LTGLLLAGAA MRSGGKVSVY QMLWPI . . 

Further work revealed the complete nucleotide sequence <SEQ ID 279>: 

1 ATGAGATTTT TCGGTATCGG TTTTTTGGTG CTGCTGTTTT TGGAGATTAT 

51 GTCGATTGTG TGGGTTGCCG ATTGGCTGGG CGGCGGCTGG ACGTTGTTTT 

101 TGATGGCGGC AGGTTTTGCC GCCGGCGTGC TGATGCTCAG GCATACGGGG 

151 CTGTCCGGTC TTTTATTGGC GGGCGCGGCA ATGAGAAGCG GCGGGAGGGT 

201 ATCCGTTTAT CAGATGTTGT GGCCTATCCG TTATACGGTG GCGGCTGTGT 

251 GTCTGATGAG TCCGGGATTC GTATCCTCGG TGTTGGCGGT ATTGCTGCTG 

301 CTGCCGTTTA AGGGAGGGGC AGTGTTGCAG GCAGGAGGTG CGGAAAATTT 

351 TTTCAACATG AACCAATCGG GCAGAAAAGA GGGCTTTTCC CGCGATGACG 

401 AT ATT AT CGA GGGAGAATAT ACGGTTGAAG AGCCTTACGG CGGCAATCGT 

451 TCCCGAAACG CCAT CGAACA CAAAAAAGAC GAATAA 

This corresponds to the amino acid sequence <SEQ ID 280; ORF73-l>: 

1 MRFFGIGFLV LLFLEIMSIV WVADWLGGGW TLFLMAAGFA AGVLMLRHTG 

51 Ti.SGTiTiTiAGAA MRSGGRVSVY OMLWPIRYTV AAVC LMSPGF VSSVLAVLLL 

101 LPFKGGAVLQ AGGAENFFNM NQSGRKEGFS RDDDIIEGEY TVEEPYGGNR 

151 SRNAIEHKKD E* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF73 shows 90.8% identity over a 76aa overlap with an ORF (ORF73a) from strain A of TV. 
meningitidis: 

10 20 30 40 50 60 

orf7 3 pep MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAAGFA AGVLMLRQTGLTGLLLAGAA 
I | | | | I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I Ml I I : I I I : I I I : I I I I I I II 
or f 7 3a MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAATFAAGWMLRHTGLSGLLLAGAA 



irf73.pep MRSGGKVSVYQMLWPI 

rf 7 3a MRS GGRV S V YXMLWX IRYT VAAVC XMS PG FVS S VXAVL LXL P FKGGAVLQAGGAEN F FNM 
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The complete length ORF73a nucleotide sequence <SEQ ID 28 1> is: 

1 ATGAGATTTT TCGGTATCGG TTTTTTGGTG CTGCTGTTTT TGGAGATTAT 

51 GTCGATTGTG TGGGTTGCCG ATTGGTTGGG CGGCGGTTGG ACGCTGTTTC 

101 TAATGGCGGC AACCTTTGCC GCCGGCGTGG TGATGCTCAG GCATACGGGG 

151 CTGTCCGGTC TTTTATTGGC GGGCGCGGCA ATGAGAAGCG GCGGGAGGGT 

201 ATCCGTTTAT CANATGTTGT GGCNTATCCG TTATACGGTG GCGGCGGTGT 

251 GTCNGATGAG TCCGGGATTC GTATCCTCGG TGTNGGCGGT ATTGCTGNTG 

301 CTNCCGTTTA AGGGAGGTGC AGTGTTGCAG GCAGGAGGTG CGGAAAATTT 

351 TTTCAACATG AACCANTCGG GCAGAAAAGA NGGCNTTTCC CGCGATGACG 

401 ATATTATCGA GGGGGAATAT ACGGTTGAAG ANCCTTACGG CGGCANTCGT 

4 51 TTCCGAAACG CCNTNGAACA CAAAAAAGAC GAATAA 

This encodes a protein having amino acid sequence <SEQ ED 282>: 

1 MRFFGIGFLV LLFLEIMSIV WVADWLGGGW TLFLMAATFA AGWMLRHTG 

51 TiSGLLLAGAA MRSGGRVSVY XMLWXIRYTV AAVC XMSPGF VSSVXAVLLX 

101 LPFKGGAVLQ AGGAENFFNM NXSGRKXGXS RDDDIIEGEY TVEXPYGGXR 

151 FRNAXEHKKD E* 

ORF73a and ORF73-1 show 91.3% identity in 161 aa overlap 

10 20 30 40 50 60 

orf 73a pep MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAATFAAGWMLRHTGLSGLLLAGAA 

I I I I I I I I I I I I I I M I I I I I I I M I I I I I I I ! M I I I I I I I = I I I M I I I I I I II I I I 
orf 73-1 MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAAGFAAGVLMLRHTGLSGLLLAGAA 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf 73a . pep MRSGGRVSVYXMLWXIRYTVAAVCXMSPGFVSSVXAVLLXLPFKGGAVLQAGGAENFFNM 

orf 7 3-1 MRSGGRVSVY QMLWPIRYTVAAVCLMSPGFVSSVLAVLLLLPFKGGAVLQAGGAENFFNM 

70 80 90 100 110 120 

130 140 150 160 

or f 7 3a . pep NXSGRKXGXSRDDDIIEGEYTVEXPYGGXRFRNAXEHKKDEX 

I I I I I I II M I I I I M I I II MM I III II II II I 

orf 7 3-1 NQS GRKEGFSRDDD 1 1 EGEYTVEEPYGGNRSRNAI EHKKDEX 

130 140 150 160 

Homology with a predicted ORF from N. gonorrhoeae 

ORF73 shows 92.1% identity over a 76aa overlap with a predicted ORF (ORF73.ng) from N. 
gonorrhoeae: 

orf 73 -pep MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAAGFAAGVLMLRQTGLTGLLLAGAA 60 

1 I II I I I M II I I I I I II I I I I I I I I I I II I I I I I I I I M II II M : I I I : I I I I II M 
orf 7 3ng MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAATFAAGVLMLRHTGLSGLLLAGAA 60 

orf 7 3. pep MRSGGKVSVYQMLWPI 7 6 

: : I : M M II M I I II 

orf 7 3ng VKSSGKVSVYQMLWPIRYTVAAVCLMSPGFVSSVLAVLLLLPFKGGAVLQAGGAENFFNM 12 0 

The complete length ORF73ng nucleotide sequence <SEQ ID 283> is: 

1 ATGAGATTTT TCGGTATCGG TTTTTTGGTG CTGCTGTTTT TGGAAATTAT 

51 GTCGATTGTG TGGGTTGCCG ATTGGCTGGG CGGCGGTTGG AcgcTGTTTC 

101 TAATGGCGGC AACCTTTGCC GCCGGTGTGC TGATGCTCAG GCATAcggGG 

151 CTGTCCGGTC TTTTATTGGC TGGCGCGGCG GTAAAAagta gtgGGAAGGT 

201 ATCTGTTTAT CagatgtTGT GGCCTATCCG TTATAcggtg gcggcggtgT 

251 GTCTGatgag tCcggGATTC GTATCCTccg tgttggCGGT ATTGCTGCTG 

301 CTGCcgttta aggGaggGgc agtgttgcag gcaggaggtg cggaaaATTT 

351 TTTCAACATg aaCcaatcgg gcagaaAaga gggatttttc cacgatgacg 

401 atattatcga gggagaatat acggttgaaa aacctgacgg cggcaatcgt 

451 tcccgaAAcg ccatcgaaca cgaaaAagac gaataA 

This encodes a protein having amino acid sequence <SEQ ID 284>: 
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1 MRFFGIGFLV LLFLEIMSIV WVADWLGGGW TLFLMAATFA AGVLMLRHTG 

51 T.gr-T.T.r.&nan VK.9SnTCV.9VY OMLWPIRYTV AAVC LMSPGF VSSVLAVLLL 

101 LPFKGGAVLQ AGGAENFFNM NQSGRKEGFF HDDDIIEGEY TVEKPDGGNR 

151 SRNAIEHEKD E* 

ORF73ng and ORG73-1 show 93.8% identity in 161 aa overlap 

10 20 30 40 50 60 

orf7 3-l pep MRFFGIGFLVLLFLEIMSIWVADWLGGGWTLFLMAAGFAAGVLMLRHTGLSGLLLAGAA 

| I | | | | | | | | ] I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I i I i i I I II 

orf7 3nq MRFFGIGFLVLLFLEIMSIVWVADWLGGGWTLFLMAATFAAGVLMLRHTGLSGLLLAGAA 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf73-l pep MRSGGRVSVYQMLWPIRYTVAAVCLMSPGFVSSVLAVLLLLPFKGGAVLQAGGAENFFNM 
:: | : | : | I I I I M I I I I I I I II I I I I I II I M I I I I I I I I I II M I I I I 1 I I I I I II I I I 
orf 73nq VKSSGKVSVYQMLWPIRYTVAAVCLMSPGFVSSVLAVLLLLPFKGGAVLQAGGAENFFNM 

70 80 90 100 110 120 

130 140 150 160 

orf 73-1 .pep NQSGRKEGFSRDDDIIEGEYTVEEPYGGNRSRNAIEHKKDEX 

orf73ng NQSGRKEGFFHDDDIIEGEYTVEKPDGGNRSRNAIEHEKDEX 
130 140 150 160 

Based on this analysis, including the presence of a putative leader sequence and putative 
transmembrane domain in the gonococcal protein, it is predicted that the proteins from 
N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 

Example 34 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 28 5>: 

1 ATGTTTGTTT TTCAGACGGC ATTCTT.ATG TTTCAGAAAC ATTTGCAGAA 

51 AGCCTCCGAC AGCGTCGTCG GAGGGACATT ATACGTGGTT GCCACGCCCA 

101 TCGGCAATTT GGCGGACATT ACCCTGCGCG CTTTGGCGGT ATTGCAAAAG 

151 GCG GCCGA AGACACGCGC GTTACCGCAC AGCTTTTGAG 

201 CGCGTACGGC ATTCAGGGCA AACTCGTCAG TGTGCGCGAA CACAACGAAC 

251 GGCAGATGGC GGACAAGATT GTCGGCTATC TTTCAGACGG CATGGTTGTG 

301 GCACAGGTTT CCGATGCGGG TACGCCGGCC GTGTGCGACC CGGGCGCGAA 

351 ACTCGCCCGC CGCGTGCGTG AGGCCGGGTT TAAAGTCGTT CCCGTCGTGG 

4 01 GCGCAAC.GC GGTGATGGCG GCTTTGAGCG TGGCCGGTGT GGAAGGATCC 

451 GAT TT TT AT T TCAACGGTTT TGTACCGCCG AAATCGGGAG AACGCAGGAA 

501 ACTGTTTGCC AAATGGGTGC GGGCGGCGTT TCCTATCGTC ATGTTTGAAA • 

551 CGCCGCACCG CATCGGTGCA GCGCTTGCCG ATATGGCGGA ACTGTTCCCC 

601 GAACGCCGAT TAATGCTGGC GCGCGAAATT ACGAAAACGT TTGAAACGTT 

651 CTTAAGCGGC ACGGTTGGGG AAAT T C AG AC GGCATTGTCT GCCGACGGCG 

701 ACCAATCGCG CGGCGAGATG GTGTTGGTGC TTTATCCGGC GCAGGATGAA 

7 51 AAACACGAAG GCTTGTCCGA GTCCGCGCAA AACAT CATGA AAATCCTCAC 

801 AGCCGAGCTG CCGACCAAAC AGGCGGCGGA GCTTGCTGCC AAAATCACGG 

851 GCGAGGGAAA GAAAGCTTTG TACGAT. . 

This corresponds to the amino acid sequence <SEQ ID 286; ORF75>: 

1 MFVFQTAFXM FQKHLQKASD SWGGTLYW ATPIGNLADI TLRALAVLQK 

51 A. ...AEDTR VTAQLLSAYG IQGKLVSVRE HNERQMADKI VGYLSDGMW 

101 AQVSDAGTPA VCDPGAKLAR RVREAGFKW PWGAXAVMA ALSVAGVEGS 

151 DFYFNGFVPP KSGERRKLFA KWVRAAFPIV MFETPHRIGA ALADMAELFP 

201 ERRLMLARE I TKTFETFLSG TVGEIQTALS ADGDQSRGEM VLVLYPAQDE 

251 KHEGLSESAQ NIMKILTAEL PTKQAAELAA KITGEGKKAL YD.. 

Further work revealed the complete nucleotide sequence <SEQ ID 287>: 
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1 ATGTTTCAGA AACATTTGCA GAAAGCCTCC GACAGCGTCG TCGGAGGGAC 

51 ATTATACGTG GTTGCCACGC CCATCGGCAA TTTGGCGGAC ATTACCCTGC 

101 GCGCTTTGGC GGTATTGCAA AAGGCGGACA TCATCTGTGC CGAAGACACG 

151 CGCGTTACCG CACAGCTTTT GAGCGCGTAC GGCATTCAGG GCAAACTCGT 

201 CAGTGTGCGC GAACACAACG AACGGCAGAT GGCGGACAAG ATTGTCGGCT 

251 ATCTTTCAGA CGGCATGGTT GTGGCACAGG TTTCCGATGC GGGTACGCCG 

301 GCCGTGTGCG ACCCGGGCGC GAAACTCGCC CGCCGCGTGC GTGAGGCCGG 

351 GTTTAAAGTC GTTCCCGTCG TGGGCGCAAG CGCGGTGATG GCGGCTTTGA 

401 GCGTGGCCGG TGTGGAAGGA TCCGATTTTT ATTTCAACGG TTTTGTACCG 

451 CCGAAATCGG GAGAACGCAG GAAACTGTTT GCCAAATGGG TGCGGGCGGC 

501 GTTTCCTATC GTCATGTTTG AAACGCCGCA CCGCATCGGT GCGACGCTTG 

551 CCGATATGGC GGAACTGTTC CCCGAACGCC GATTAATGCT GGCGCGCGAA 

601 ATTACGAAAA CGTTTGAAAC GTTCTTAAGC GGCACGGTTG GGGAAATTCA 

651 GACGGCATTG TCTGCCGACG GCAACCAATC GCGCGGCGAG ATGGTGTTGG 

701 TGCTTTATCC GGCGCAGGAT GAAAAACACG AAGGCTTGTC CGAGTCCGCG 

751 C AAAAC AT C A TGAAAATCCT CACAGCCGAG CTGCCGACCA AACAGGCGGC 

801 GGAGCTT GCT GCCAAAATCA CGGGCGAGGG AAAGAAAGCT TTGTACGATC 

851 TGGCTCTGTC TTGGAAAAAC AAATAG 

This corresponds to the amino acid sequence <SEQ ID 288; ORF75-l>: 

1 MFQKHLQKAS DSWGGTLYV VAT P I GN LAD ITLRALAVLQ KADIICAEDT 

51 RVTAQLL SAY GIQGKLVSVR EHNERQMADK IVGYLSDGMV VAQVSDAGTP 

101 AVCDPGAKLA RRVREAGFK V VPWGASAVM AALSVA GVEG SDFYFNGFVP 

151 PKSGERRKLF AKWVRAAFPI VMFETPHRIG ATLADMAELF PERRLMLARE 

201 ITKTFETFLS GTVGEIQTAL SADGNQSRGE MVLVLYPAQD EKHEGLSESA 

251 QNIMKILTAE LPTKQAAELA AKITGEGKKA LYDLALSWKN K* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF75 shows 95.8% identity over a 283aa overlap with an ORF (ORF75a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

or f 7 5 . pep MFVFQT AFXMFQKHLQKAS D S WGGTLY WAT P I GNL AD I T LRALAVLQKAXXXXAE DTR 
I [ ! I ! I I M I I II I I I 1 I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I 

or f 7 5a MFQKHLQKAS D S WGGT LYWATPI GNL AD I T LRAL AVLQKAD 1 1 CAE DTR 

10 20 30 40 50 

70 80 90 100 110 120 

or f 7 5 . pep VTAQLL S AYG IQGKLVS VRE HNERQMADKI VGYL S DGMWAQVS DAGT PAVCD PGAKLAR 
I I II II I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 
o r f 7 5 a VTAQLL S AYG I QGKLVS VRE HNERQMADKI VGYLS DGMWAQVS DAGT PAVCD PGAKLAR 

60 70 80 90 100 110 

130 140 150 160 170 180 

or f 7 5 . pep RVREAGFK WPVVGAXAVMAALSVA GVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIV 
I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I : I 
or f 7 5a RVREVGFK WPVVGASAVMAALSVA GVAGSDFYFNGFVPPKSGERRKLFAKWVRVAFPVV 
120 130 140 150 160 170 

190 200 210 220 230 240 

orf 7 5 . pep MFETPHRIGAALADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALSADGDQSRGEM 
I I II II I I 1 I : I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I : I I I : I I I I I I 
or f 7 5a MFETPHRIGATLADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALAADGNQSRGEM 
180 190 200 210 220 230 

250 260 270 280 290 

orf 75 . pep VLVLYPAQDEKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYD 
I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I i I i I i M I I 1 1 II 
orf 7 5a VLVLYPAQDEKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYDLALSWKNK 
240 250 260 270 280 290 



60 orf75a X 

The complete length ORF75a nucleotide sequence <SEQ ID 289> is: 
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1 ATGTTTCAGA AACATTTGCA GAAAGCCTCC GACAGCGTCG TCGGAGGGAC 

51 ATTATACGTG GTTGCCACGC CCATCGGCAA TTTGGCGGAC ATTACCCTGC 

101 GCGCTTTGGC GGTATTGCAA AAGGCGGACA TCATCTGTGC CGAAGACACG 

151 CGCGTTACCG CGCAGCTTTT GAGCGCGTAC GGCATTCAGG GCAAACTCGT 

201 CAGCGTGCGC GAACACAACG AACGGCAGAT GGCGGACAAG ATTGTCGGCT 

251 ATCTTTCAGA CGGCATGGTT GTGGCACAGG TTTCCGATGC GGGTACGCCG 

301 GCCGTGTGCG ACCCGGGCGC GAAACTCGCC CGCCGCGTGC GTGAGGTCGG 

351 GTTTAAAGTT GTCCCTGTTG TCGGCGCAAG CGCGGTGATG GCGGCTTTGA 

401 GTGTGGCTGG TGTGGCGGGA TCCGATTTTT ATTTCAACGG TTTTGTACCG 

451 CCGAAAT CGG GCGAACGTAG GAAATTGTTT GCCAAATGGG TGCGGGTGGC 

501 GTTTCCCGTC GTGATGTTTG AAACGCCGCA CCGCATCGGG GCGACGCTTG 

551 CCGATATGGC GGAACTGTTC CCCGAACGCC GATTAATGCT GGCGCGCGAA 

601 AT C ACGAAAA CGTTTGAAAC GTTCTTAAGC GGCACGGTTG GGGAAATTCA 

651 GACGGCATTG GCGGCGGACG GCAACCAATC GCGCGGCGAG ATGGTGTTGG 

701 TGCTTTATCC GGCGCAGGAT GAAAAACACG AAGGCTTGTC CGAGTCCGCG 

751 CAAAACATCA TGAAAATCCT CACAGCCGAG CTGCCGACCA AACAGGCGGC 

801 GGAGCTTGCC GCCAAAATCA CGGGCGAGGG AAAAAAAGCT TTGTACGATC 

851 TGGCACTGTC TTGGAAAAAC AAATGA 

This encodes a protein having amino acid sequence <SEQ ID 290>: 

1 MFQKHLQKAS DSWGGTLYV VATPIGNLAD I TLRALAVLQ KADIICAEDT 

51 RVTAQLLSAY GIQGKLVSVR EHNERQMADK IVGYLSDGMV VAQVSDAGTP 

101 AVCDPGAKLA RRVREVGF KV VPWGASAVM AALSVA GVAG SDFYFNGFVP 

151 PKSGERRKLF AKWVRVAFPV VMFETPHRIG ATLADMAELF PERRLMLARE 

201 ITKTFETFLS GTVGEIQTAL AADGNQSRGE MVLVLYPAQD EKHEGLSESA 

251 QNIMKILTAE LPTKQAAELA AKITGEGKKA LYDLALSWKN K* 

ORF75a and ORF75-1 show 98.3% identity in 291 aa overlap: 

10 20 30 40 50 60 

orf75a pep MFQKHLQKASDSWGGTLYWATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAY 
I | I I I I I I I I I I I I I I I I I 1 II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 M I I I 
orf 7 5-1 MFQKHLQKAS DSVVGGTLYVVATPIGNLADITLRALAVLQKADI I CAEDTRVTAQLLS AY 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f 7 5a pep GIQGKLVSVREHNERQMADKIVGYLSDGMWAQVSDAGTPAVCDPGAKLARRVREVGFKV 
i M M I I I I I I I I I I I I I I I I i I I M I I I I 1 I I I I I I I I I I I I I I I I I 1 I 1 I I I I : I I I I 
orf 7 5-1 GIQGKLVSVREHNERQMADKIVGYLSDGMWAQVSDAGTPAVCDPGAKLARRVREAGFKV 
70 80 90 100 110 120 

130 140 150 160 170 180 

orf 7 5a . pep VPWGASAVMAALSVAGVAGSDFYFNGFVPPKSGERRKLFAKWVRVAFPWMFETPHRIG 
I I I I I I I I I I I I I I 1 I I I I I I I I ! I I 1 M I I I I I I I I I I I I I I I : I I I : I I I I I I I ! I I 
orf 75-1 VPVVGASAVMAALSVAGVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIVMFETPHRIG 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 75a . pep m ATLADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQD 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i = I I I I I I I I I I I M I I I I 1 I 
orf 7 5-1 ATLADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALSADGNQSRGEMVLVLYPAQD 
190 200 210 220 230 240 

250 260 270 280 290 

or f 7 5a . pep EKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYDLALSWKNKX 



Homology with a predicted ORF from N.gonorrhoeae 

ORF75 shows 93.2% identity over a 292aa overlap with a predicted ORF (ORF75.ng) from N. 
gonorrhoeae: 

orf 75 .pep MFVFQT AFXMFQKHLQKAS D S WGGTLYWAT P I GNLAD I TLRALAVLQKA AEDTR 5 6 

I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I 

orf7 5ng MSVFQTAFFMFQKHLQKASDS WGGTLYWAT P I GNLADITLRALAVLQKADI ICAE'DTR 60 
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nrf75 Dep VTAQLLSAYGIQGKLVSVREHNERQMADKIVGYLSDGMWAQVSDAGTPAVCDPGAKLAR 116 

|||||||||||||:||||||||IIIIIII::|:IIII:IIIIIMIIIIIIIIIIIMM _ 
orf 7 5ng VTAQLLSAYGIQGRLVSVREHNERQMADKVIGFLSDGLWAQVSDAGTPAVCDPGAKLAR 120 

orf7 5 Dec RVREAGFKWPWGAXAVMAALSVAGVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIV 17 6 
M I I I I I I I I I I I I I 1 11 I I I I I I I I I M I I I I M I 1 I I I I M I I I I II II II I I : I 

orf75ng rvreagfkvvpwgasavmaalsvagvaesdfyfngfvppksgerrklfakwvraafpw 180 

orf75 pep MFETPHRIGAALADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALSADGDQSRGEM 236 

I I I I I I I I I I : I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I : I I I : I I I I II 
or f 7 5ng MFETPHRIGATLADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALAADGNQSRGEM 24 0 

orf75 pep VLVLYPAQDEKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYD 288 

I I I I I I I I I I I II I M I : I I I I I I I I I I I I ! 

orf75ng VLVLYPAQDEKHEGLSESAQNAMKILAAELPTKQAAELAAKITGEGKKALYDLALSWKNK 300 

An ORF75ng nucleotide sequence <SEQ ID 29 1> was predicted to encode a protein having amino 
acid sequence <SEQ ID 292>: 

1 MSVFQTAFFM FQKHLQKASD SWGGTLYW ATPIGNLADI TLRALAVLQK 

51 ADIICAEDTR VTAQLLSAYG IQGRLVSVRE HNERQMADKV IGFLSDGLW 

101 AQVSDAGTPA VCDPGAKLAR RVREAGF KW PWGASAVMA ALSVA GVAES 

151 DFYFNGFVPP KSGERRKLFA KWVRAAFPW MFETPHRIGA TLADMAELFP 

201 ERRLMLARE I TKTFETFLSG TVGEIQTALA ADGNQ SRGEM VLVLYPAQDE 

251 KHEGLSESAQ NAMKILAAEL PTKQAAELAA KITGEGKKAL YDLALSWKNK 

301 * 

After further analysis, the following gonococcal DNA sequence <SEQ ID 293> was identified: 

1 ATGTTTCAGA AACACTTGCA GAAAGCCTCC GACAGCGTCG TCGGAGGGAC 

51 ATTATACGTG GTTGCCACGC CCATCGGCAA TTTGGCAGAC ATTACCCTGC 

101 GCGCTTTGGC GGTATTGCAA AAGGCGGACA TCATTTGTGC CGAAGACACG 

151 CGCGTTACTG CGCAGCTTTT GAGCGCGTAC GGCAT TCAGG GCAGGTTGGT 

201 CAGTGTGCGC GAACACAACG AGCGGCAGAT GGCGGACAAG GTAATCGGTT 

251 TCCTTTCAGA CGGCCTGGTT GTGGCGCAGG TTTCCGATGC GGGTACGCCG 

301 GCCGTGTGCG ACCCGGGCGC GAAACTCGCC CGCCGCGTGC GCGAAGCAGG 

351 GTTCAAAGTC GTTCCCGTCG TGGGCGCAAG CGCGGTAATG GCGGCGTTGA 

4 01 GTGTGGCCGG TGTGGCGGAA TCCGATTTTT ATTTCAACGG TTTTGTACCG 

451 CCGAAATCGG GCGAACGTAG GAAATTGTTT GCCAAATGGG TGCGGGCGGC 

501 ATTTCCTGTC GTCATGTTTG AAACGCCGCA CCGAATCGGG GCAACGCTTG 

551 CCGATATGGC GGAATTGTTC CCCGAACGCC GTCTGATGCT GGCGCGCGAA 

601 ATCACGAAAA CGTTTGAAAC GTTCTTAAGC GGCACGGTTG GGGAAATTCA 

651 GACGGCATTG GCGGCGGACG GCAACCAATC GCGCGGCGAG ATGGTGTTGG 

7 01 TGCTTTATCC GGCGCAGGAT GAAAAACACG AAGGCTTGTC CGAGTCTGCG 

7 51 CAAAATGCGA TGAAAATCCT TGCGGCCGAG CTGCCGACCA AGCAGGCGGC 

801 GGAGCTTGCC GCCAAGATTA CAGGTGAGGG CAAAAAGGCT TTGTACGATT 

851 TGGCACTGTC GTGGAAAAAC AAATGA 

This corresponds to the amino acid sequence <SEQ ID 294; ORF75ng-l>: 

1 MFQKHLQKAS DSWGGTLYV VATPIGNLAD ITLRALAVLQ KADI I CAE DT 

51 RVTAQLLSAY GIQGRLVSVR EHNERQMADK VIGFLSDGLV VAQVSDAGTP 

101 AVCDPGAKLA RRVRE AG FK V VPWGASAVM AALSVA GVAE SDFYFNGFVP 

151 PKSGERRKLF AKWVRAAFPV VMFETPHRIG AT LADMAE L F PERRLMLARE 

201 ITKTFETFLS GTVGEIQTAL AADGNQSRGE MVLVLYPAQD EKHEGLSESA 

251 QNAMKILAAE LPTKQAAELA AKITGEGKKA LYDLALSWKN K* 



ORF75ng-l and ORF75-1 show 96.2% identity in 291 aa overlap: 



orf 75-1 . pep MFQKHLQKASDSVVGGTLYWATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAY 
I I I I I I I I I I I I 1 I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf7 5ng-l MFQKHLQKAS D3WGGTLYWATPIGNLADITLRALAVLQKADI I CAEDTRVTAQLL SAY 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 7 5-1. pep GIQGKLVSVREHNERQMADKIVGYLSDGMWAQVSDAGTPAVCDPGAKLARRVREAGFKV 
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rf75ng-l 



| I ! I : I I I I I I I I I I I I I I I :: I : I I I I : I I Nil I ! I I I I I I I I I I I I I 

GIQGRLVSVREHNERQMADKVIGFLSDGLWAQVSDAGTPAVCDPGAKLARRVREAGFKV 
70 80 90 100 110 120 



130 140 150 160 170 180 

orf75-l Pep VPVVGASAVMAALSVAGVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIVMFETPHRIG 
|| | | | | | | | | 1 M I I I I I I I I I I I I I 1 I I I I I I I I I I I I M i I I I II: I I I I I I I I I I 
nrf7 5na-l VPWGAS AVMAAL SVAGVAE S D FY FNGFVP PK S GERRKL FAKWVRAAFP WMFET PHRI G 
g 130 140 150 160 170 180 

190 200 210 220 230 240 

orf75-l pep ATLADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALSADGNQSRGEMVLVLYPAQD 
| | | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I 
orf75na-l ATLADMAELFPERRLMLAREITKTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQD 

190 200 210 220 230 240 

250 260 270 280 290 

orf75-l pep EKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALYDLALSWKNKX 

M I 1 I I I I I I II I I I I '■ I I I I I I I I 1 I I I I I I I I I I I I I I I I I 1 I I 

orf75ng-l ekheglsesaqnamkilaaelptkqaaelaakitgegkkalydlalswknkx 
250 260 270 280 290 

Furthermore, ORG75ng-l shows significant homology to a hypothetical E.coli protein: 

sp|P4 5528|YRAL_ECOLI HYPOTHETICAL 31.3 KD PROTEIN IN AGAI-MTR INTERGENIC REGION 
(F286) 

>gi I 606086 (U18997) ORF_f286 [Escherichia coli] 

>gi 11789535 (AE000395) hypothetical 31.3 kD protein in agai-mtr intergenic 
region [Escherichia coli] Length = 286 
Score = 218 bits (550), Expect = 3e-56 

Identities = 128/284 (45%), Positives = 171/284 (60%), Gaps = 4/284 (1%) 

KHLQKASDSVVGGTLYVVATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAYGIQ 63 
K Q A +S G LY+V TPIGNLADIT RAL VLQ D+I AEDTR T LL +GI 
KQHQSADNSQ — GQLYIVPTPIGNLADITQRALEVLQAVDLIAAEDTRHTGLLLQHFGIN 59 

GRLVSVREHNERQMADKVIGFLSDGLWAQVSDAGTPAVCDPGAKLARRVREAGFKWPV 123 

RL ++ +HNE+Q A+ ++ L +G +A VSDAGTP + DPG L R REAG +WP+ 
ARLFALHDHNEQQKAETLLAKLQEGQNIALVSDAGTPLINDPGYHLVRTCREAGIRVVPL 119 

VGASAVMAALSVAGVAESDFYFNGFVPPKSGERRKLFAKWVRAAFPWMFETPHRIGATL 183 
G A + ALS AG+ F + GF+P KS RR ++ +E+ HR+ +L 



Query: 


4 


Sbjct: 


2 


Query: 


64 


Sbjct: 


60 


Query: 


124 


Sbjct: 


120 


Query: 


184 


Sbjct: 


180 


Query: 


243 


Sbjct: 


239 



+ E R ++LARE+TKT+ET VGE+ + D N+ +GEMVL++ + 

VLGE SRYWLARE LTKTWET IHGAPVGELLAWVKEDENRRKGEMVL IV-EGHKAQ 238 

E S AQNAMKI LAAE L PTKQAAE LAAK I TGEGKKALYD LAL 28 6 

A + +L AELP K+AA LAA+I G K ALY AL 
ADALRTLALLQAELPLKKAAALAAE I HGVKKNAL YKY AL 282 

Based on this analysis, including the presence of a putative transmembrane domain in the 
gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 35 

55 The following partial DNA sequence was identified in N. meningitidis <SEQ ID 295>: 

1 ATGAAACAGA AAAAAACCGC TGCCGCAGTT ATTGCTGCAA TGTTGGCAGG 

51 TTTTGCGGCA GC.AAAGCAC CCGAAATCGA CCCGGCTTTG 

// 

651 GAGTTGG TCAGAAACCA GTTGGAGCAG GGTTTGAGAC 
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701 AGGAAAAAGC CCGCTTGAAA ATCGATGCCC TTTTGGAAGA AAACGGTGTC 
7 51 AAACCGTAA 

This corresponds to the amino acid sequence <SEQ ID 296; ORF76>: 

1 MKQKKTAAAV IAAMLAGFAA XKAPEIDPAL 

// 

201 ELVRNQLEQG LRQEKARLKI DALLEENGVK 

251 P* 

Further work revealed the complete nucleotide sequence <SEQ ID 297>: 

1 AT GAAACAGA AAAAAACCGC TGCCGCAGTT ATTGCTGCAA TGTTGGCAGG 

51 TTTTGCGGCA GCCAAAGCAC CCGAAATCGA CCCGGCTTTG GTGGATACGC 

101 TGGTGGCGCA GATCATGCAG CAGGCAGACC GGCATGCGGA GCAGTCCCAA 

151 AAACCGGACG GGCAGGCAAT CCGAAACGAT GCCGTCCGCC GGCTACAAAC 

201 TTTGGAAGTT TTGAAAAACA GGGCATTGAA GGAAGGTTTG GATAAGGATA 

251 AGGATGTCCA AAACCGCTTT AAAATCGCCG AAGCGTCTTT TTATGCCGAG 

301 GAGTACGTCC GTTTTCTGGA ACGTTCGGAA ACGGTTTCCG AAGACGAGCT 

351 GCACAAGTTT TACGAACAGC AAATCCGCAT GATCAAATTG CAGCAGGTCA 

401 GCTT CGCAAC CG AAGAG GAG GCGCGTCAGG CGCAGCAGCT CCTGCTCAAA 

451 GGGCTGTCTT TTGAAGGGCT GATGAAGCGT TATCCGAACG ACGAGCAGGC 

501 TTTTGACGGT TTCATTATGG CGCAGCAGCT TCCCGAGCCG CTGGCTTCGC 

551 AGTTTGCCGC GATGAATCGG GGCGACGTTA CCCGCGATCC GGTCAAATTG 

601 GGCGAACGCT ATTATCTGTT CAAACTCAGC GAGGTCGGGA AAAACCCCGA 

651 CGCGCAGCCT TTCGAGTTGG TCAGAAACCA GTTGGAGCAG GGTTTGAGAC 

701 AGGAAAAAGC CCGCTTGAAA ATCGATGCCC TTTTGGAAGA AAACGGTGTC 

751 AAACCGTAA 

This corresponds to the amino acid sequence <SEQ ID 298; ORF76-l>: 

1 MKQKKTAAAV IAAMLAGFAA AKA PEIDPAL VDTLVAQIMQ QADRHAEQSQ 
51 KPDGQAIRND AVRRLQTLEV LKNRALKEGL DKDKDVQNRF KIAEASFYAE 
101 EYVRFLERSE TVSEDELHKF YEQQIRMIKL QQVSFATEEE ARQAQQLLLK 
151 GLSFEGLMKR YPNDEQAFDG FIMAQQLPEP LASQFAAMNR GDVTRDPVKL 
201 GERYYLFKLS EVGKNPDAQP FELVRNQLEQ GLRQEKARLK I DALLEENGV 
251 KP* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF76 shows 96.7% identity over a 30aa overlap and 96.8% identity over a 31aa overlap with an 
ORF (ORF76a) from strain A of N. meningitidis: 



orf7 6.pep MKQKKTAAAVIAAMLAGFAAXKA PEIDPAL 
1 I I I I I i I [ I I I I I I II I I I II I I I I I I I 
orf 7 6a MKQKKTAAAVIAAMLAGFAAAKA PEIDPALVDTLVAQIMQQADRHAEQSQKPDGQAIRND 
10 20 30 40 50 60 

// 

70 80 90 

orf 7 6 pep XELVRNQLEQGLRQEKARLKIDALLEENGVKPX 

I I II I I I I I I I I I I I I I I I I I I : II II I I II I 
orf 7 6a DVTRDPVKLGERYYLFKLSEVGKNPDAQPFELVRNQLEQGLRQEKARLKIDAILEENGVKPX 
200 210 220 230 240 250 

The complete length ORF76a nucleotide sequence <SEQ ID 299> is: 

1 AT GAAACAGA AAAAAACCGC TGCCGCAGTT ATTGCTGCAA TGTTGGCAGG 

51 TTTTGCGGCA GCCAAAGCAC CCGAAATCGA CCCGGCTTTG GTGGATACGC 

101 TGGTGGCGCA GATCATGCAG CAGGCAGACC GGCATGCGGA GCAGTCCCAA 

151 AAACCGGACG GGCAGGCAAT CCGAAACGAT GCCGTCCGTC GGCTGCAAAC 

201 TTTGGAAGTT TTGAAAAACA GGGCATTGAA GGAAGGTTTG GATAAGGATA 

251 AGGATGTCCA AAACCGCTTT AAAATCGCCG AAGCGTCTTT TTATGCCGAG 

301 GAGTACGTCC GTTTTCTGGA ACGTTCGGAA ACGGTTTCCG AAAGCGCACT 

351 GCGTCAGTTT TATGAGCGGC AAATCCGCAT GATCAAATTG CAGCAGGTCA 



CHIR-0160 (356.001) 



-217- 



PATENT 



401 GCTTCGCAAC CGAAGAGGAG GCGCGTCAGG CGCAGCAGCT CCTGCTCAAA 
4 51 GGGCTGTCTT TTGAAGGGCT GATGAAGCGT TATCCGAACG ACGAGCAGGC 

501 TTTTGACGGT TTCATTATGG CGCAGCAGCT TCCCGAGCCG CTGGCTTCGC 

551 AGTTTGCAGC GATGAATCGG GGCGACGTTA CCCGCGATCC GGTCAAATTG 

5 601 GGCGAACGCT ATTATCTGTT CAAACTCAGC GAGGTCGGGA AAAACCCCGA 

651 CGCGCAGCCT TTCGAGTTGG TCAGAAACCA GTTGGAACAA GGTTTGAGAC 

701 AGGAAAAAGC CCGCTTGAAA ATCGATGCCA TTTTGGAAGA AAACGGTGTC 

751 AAACCGTAA 

This encodes a protein having amino acid sequence <SEQ ID 300>: 

10 1 MKQKKTAAAV IAAMLAGFAA AKA PEIDPAL VDTLVAQIMQ QADRHAEQSQ 

51 KPDGQAIRND AVRRLQTLEV LKNRALKEGL DKDKDVQNRF KIAEASFYAE 

101 EYVRFLERSE TVSESALRQF YERQIRMIKL QQVSFATEEE ARQAQQLLLK 

151 GLSFEGLMKR YPNDEQAFDG FIMAQQLPEP LASQFAAMNR GDVTRDPVKL 

201 GERYYLFKLS EVGKNPDAQP FELVRNQLEQ GLRQEKARLK IDAILEENGV 

15 251 KP* 

ORF76a and ORF76-1 show 97.6% identity in 252 aa overlap: 

10 20 30 40 50 60 

orf7 6a pep MKQKKTAAAVIAAMLAGFAAAKAPEIDPALVDTLVAQIMQQADRHAEQSQKPDGQAIRND 
i | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I ! I I I I I I I I I I I I M I I I I I 
20 orf7 6-l MKQKKTAAAVIAAMLAGFAAAKAPEIDPALVDTLVAQIMQQADRHAEQSQKPDGQAIRND 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f 7 6a . pep AVRRLQTLEVLKNRALKEGLDKDKDVQNRFKIAEASFYAEEYVRFLERSETVSESALRQF 

25 I I I I I I I I I I I I I 1 I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I M : I : : I 

or f 7 6-1 AVRRLQTLEVLKNRALKEGLDKDKDVQNRFKIAEASFYAEEYVRFLERSETVSEDELHKF 
70 80 90 100 110 120 

130 140 150 160 170 180 

30 orf 7 6a . pep YERQIRMIKLQQVSFATEEEARQAQQLLLKGLSFEGLMKRYPNDEQAFDGFIMAQQLPEP 

i : I I I I I I I I I I II I II Ill I I I I I I I I I I I I I 

orf 7 6-1 YEQQIRMIKLQQVSFATEEEARQAQQLLLKGLSFEGLMKRYPNDEQAFDGFIMAQQLPEP 
130 140 150 160 170 180 

35 190 200 210 220 230 240 

or f 7 6a . pep LASQFAAMNRGDVTRDPVKLGERYYLFKLSEVGKNPDAQPFELVRNQLEQGLRQEKARLK 
M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i M I M I I I I I I I I I I I I I M M M I I 
orf 7 6-1 LASQFAAMNRGDVTRDPVKLGERYYLFKLSEVGKNPDAQPFELVRNQLEQGLRQEKARLK 
190 200 210 220 230 240 

40 

250 

orf 7 6a. pep IDAILEENGVKPX 
I I I : I I I I I I I I I 
orf 7 6-1 I DALLEENGVKPX 

45 250 

Homology with a predicted ORF from N. gonorrhoeae 

The aligned aa sequences of ORF76 and a predicted ORF (ORF76.ng) from N. gonorrhoeae of the 
N- and C-termini show 96.7 % and 100% identity in 30 and 31 overlap, respectively: 

50 orf 7 6. pep MKQKKTAAAVIAAMLAGFAAXKAPEI DPAL 30 

orf7 6ng MKQKKTAAAVIAAMLAGFAAAKAPEIDPALVDTLVAQIMQQADRHAEQSQRPDGQAIRND 60 

// 

orf 7 6. pep ELVRNQLEQGLRQEKARLKIDALLEENGVKP 251 

55 I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf 7 6ng VTRNPVKLGERYYLFKLGAVGKNPDAQPFELVRNQLEQGLRQEKARLKIDALLEENGVKP 251 

The complete length ORF76ng nucleotide sequence <SEQ ID 301> is: 

1 AT GAAAC AG A AAAAGAC CGC TGCCGCAGTT ATTGCTGCAA TGTTGGCAGG 
51 TTTTGCGGCA GCCAAAGCAC CCGAAATCGA CCCGGCTTTG GTGGATACGC 
60 101 TGGTGGCGCA GATCATGCAG CAGGCAGACC GGCATGCGGA GCAGTCCCAA 
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151 AGACCGGACG GGCAGGCAAT CCGAAACGAT GCCGTCCGCC GGCTGCAAAC 

201 TTTGGAAGTT TTGAAAAACA GGGCATTGAA GGAAGGTTTG GATAAGGATA 

251 AGGATGTCCA AAACCGCTTT AAAATCGCCG AAGCGTCTTT TTATGCCGAG 

301 GAGTACGTCC GTTTTCTGGA ACGTTCGGAA ACGGTTTCCG AAAGCGCACT 

351 GCGTCAGTTT TATGAGCGGC AAAT CCGCAT GATCAAATTG CAGCAGGTCA 

401 GCTTCGCAAC CGAAGAGGAG GCGCGTCAGG CGCAGCAGCT CCTGCTCAAA 

451 GGGCTGTCTT TTGAAGGGCT GATGAAGCGT TATCCGAACG ACGAGCAGGC 

501 GTTCGACGGT T T CAT TAT GG CGCAGCAGCT TCCCGAGCCG CTGGCTTcgc 

551 agtttgCCGG TATGAACCGT GGCGACGTTA CCCGCAATCC GGTCAAATTG 

601 GGCGAACGCT ATTACCTGTT CAAACTCGGC GCGGTCGGGA AAAACCCCGA 

651 CGCGCAGCCT TTCGAGTTGG TCAGAAACCA GTTGGAACAA GGTTTGAGGC 

701 AGGAAAAAGC CCGCTTGAAA ATCGATGCCC TTTTGGAaga Aaacggtgtc 

751 AaacCGTAA 

This encodes a protein having amino acid sequence <SEQ ID 302>: 

1 MKOKKTAAAV IAAMLAGFAA AKA PEIDPAL VDTLVAQIMQ QADRHAEQSQ 

51 RPDGQAIRND AVRRLQTLEV LKNRALKEGL DKDKDVQNRF KIAEASFYAE 

101 EYVRFLERSE TVSESALRQF YERQIRMIKX QQVSFATEEE ARQAQQLLLK 

151 GLSFEGLMKR YPNDEQAFDG FIMAQQLPEP LASQFAGMNR GDVTRNPVKL 

201 GERYYLFKLG AVGKNPDAQP FELVRNQLEQ GLRQEKARLK I DALLEENGV 

251 KP* 

ORF76ng and ORF76-1 show 96.0% identity in 252 aa overlap 

10 20 30 40 50 60 

MKQKKTAAAVIAAMLAGFAAAKAPEIDPALVDTLVAQIMQQADRHAEQSQKPDGQAIRND 
i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I Ml I I I I : I I I I I I I I I 
MKQKKTAAAVIAAMLAGFAAAKAPEIDPALVDTLVAQIMQQADRHAEQSQRPDGQAIRND 
10 20 30 40 50 60 



orf7 6-l.pep 
orf7 6ng 



70 80 90 100 110 120 

orf 7 6-1 . pep AVRRLQTLEVLKNRALKEGLDKDKDVQNRFKIAEASFYAEEYVRFLERSETVSEDELHKF 

I Mill II I I II I I II I : I : : I 

orf7 6ng AVRRLQTLEVLKNRALKEGLDKDKDVQNRFKIAEASFYAEEYVRFLERSETVSESALRQF 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 7 6-1 .pep YEQQIRMIKLQQVSFATEEEARQAQQLLLKGLSFEGLMKRYPNDEQAFDGFIMAQQLPEP 
II : I M I I I li I I II I I I I I I I I I I I I I II I I I M II I I I II II I I II II M I II II II I 
orf7 6ng YERQIRMIKLQQVSFATEEEARQAQQLLLKGLSFEGLMKRYPNDEQAFDGFIMAQQLPEP 

130 140 150 160 170 180 



190 200 210 220 230 240 

or f 7 6-1 . pep LASQFAAMNRGDVTRDPVKLGERYYLFKLSEVGKNPDAQPFELVRNQLEQGLRQEKARLK 
II M M : \ II I II II : \ I I I I II I I M I I : I I I II M I I I I I I I I I M I M II I II I II 
orf7 6ng LASQFAGiMNRGDVTRNPVKLGERYYLFKLGAVGKNPDAQPFELVRNQLEQGLRQEKARLK 

190 200 210 220 230 240 



orf7 6-l.] 
orf7 6ng 



Furthermore, ORF76ng shows significant homology to a B.subtilis export protein precursor: 



sp|P24327 |PRSA_BACSU PROTEIN EXPORT PROTEIN PRSA PRECURSOR >gi | 98227 | pir | | S15269 
33K lipoprotein - Bacillus subtilis >gi 139782 (X57271) 33kDa lipoprotein 
[Bacillus subtilis] 

>gi|2226124|gnl|PID|e325181 (Y14077) 33kDa lipoprotein [Bacillus subtilis] 
>gi|2633331|gnl|PID|ell82997 (Z99109) molecular chaperonin [Bacillus subtilis] 
Length =2 92 
Score = 50.4 bits (118), Expect = le-05 

Identities = 48/199 (24%), Positives = 82/199 (41%), Gaps = 32/199 (16%) 



Query: 70 VLKNRALKEGLDK DKDVQNRFKI AEAS F YAEEYVRFLERSETVSE 114 

VL ++ LDK DK++ N+ K + Y ++Y++ + E +++ 

Sbjct: 53 VLTQLVQEKVLDKKYKVS DKE I DNKLKEYKTQLGDQYTALEKQYGKD YLKE QVKYELLTQ 112 
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Query: 


115 


Sbjct: 


113 


Query: 


164 


Sbjct: 


173 


Query: 


219 


Sbjct: 


232 



G+V+ DPVK Y++ K +E 



Based on this analysis, including the presence of a putative leader sequence and a RGD motif in 
15 the gonococcal protein, it was predicted that the proteins from N.meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF76-1 (27.8kDa) was cloned in the pET vector and expressed in E.coli, as described above. The 
products of protein expression and purification were analyzed by SDS-PAGE. Figure 10A shows 
the results of affinity purification of the His-fusion protein, Purified His-fusion protein was used 
20 to immunise mice, whose sera were used for Western blot (Figure 10B), ELISA (positive result), 
and FACS analysis (Figure 10C). These experiments confirm that ORF76-1 is a surface-exposed 
protein, and that it is a useful immunogen. 

Example 36 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 303>: 

25 1 ATGAAAAAAT CTTTCCTTAC GCTTGTTCTG TATTCGTCTT TACTTACCGC 

51 CAGCGAAATT GCC^TACCCC TTGGAATTGG GGATTGAAAC CTTACCGGCG 

101 GCAAAAATTG CGGAAACGTT TGCGCTGACA TTTGTGATTG CTGCGCTGTA 

151 TCTGTTTGCG CGTAATAAGG TGACGCGTTT GTTGATTGCG GTGTTTTTTG 

201 CGTTCAGCAT TATTGCCAAC AATGTGCATT ACGCGGATTA TCAAAGCTGG 

30 251 ATGACG 

// 

1201 CAAACCGTAT TCGAGCAGCT GCAAAAGACT CCTGACGGCA 

1251 ACTGGCTGTT TGCCTATACC TCCGATCATG GCCAGTATGT TCGCCAAGAT 

1301 ATCTACAATC AAGGCACGGT GCAGCCCGAC AGCTATCTCG TGCCGCTAGT 

35 1351 GTTGTACAGC CCGGATAAGG CCGTGCAACA GGCTGCCAAC CAGGCTTTTG 

14 01 CGCCTTGCGA GATTGCCTTC CAT CAGCAGC TTTCAACGTT CCTGATTCAC 

1451 ACGTTGGGCT ACGATATGCC GGTTTCAGGT TGTCGCGAAG GCTCGGTAAC 

1501 GGGCAACCTG ATTACGGGTG ATGCAGGCAG CTTGAACATT CGCGACGGCA 

1551 AGGCGGAATA TGTTTATCCG CAATGA 

40 This corresponds to the amino acid sequence <SEQ ID 304; ORF8 1>: 

1 MKKSFLTLVL YSSLLTASEI AYPLELGIET LPAAKIAETF ALTFVIAALY 

51 LFARNKVTRL LIAVFFAFSI IANNVHYADY QSWMT 

// 

401 . . . QTVFEQL QKTPDGNWLF AYTSDHGQYV RQDIYNQGTV QPDSYLVPLV 
45 451 LYSPDKAVQQ AANQAFAPCE IAFHQQLSTF LIHTLGYDMP VSGCREGSVT 

501 GNLITGDAGS LNIRDGKAEY VYPQ* 

Further work revealed the complete nucleotide sequence <SEQ ID 305>: 

1 ATGAAAAAAT CTTTCCTTAC GCTTGTTCTG TATTCGTCTT TACTTACCGC 

51 CAGCGAAATT GCCTATCGCT TTGTATTTGG GATTGAAACC TTACCGGCGG 
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101 CAAAAATTGC GGAAACGTTT GCGCTGACAT TTGTGATTGC TGCGCTGTAT 

151 CTGTTTGCGC GTTATAAGGT GACGCGTTTG TTGATTGCGG TGTTTTTTGC 

201 GTTCAGCATT ATTGCCAACA ATGTGCATTA CGCGGTTTAT CAAAGCTGGA 

251 TGACGGGCAT CAATTATTGG CTGATGCTGA AAGAGGTTAC CGAAGT CGGC 

5 301 AGCGCGGGTG CGTCGATGTT GGATAAGTTG TGGCTGCCTG TGTTGTGGGG 

351 CGTGTTGGAA GTCATGTTGT TTTGCAGCCT TGCCAAGTTC CGCCGTAAGA 

4 01 CGCATTTTTC TGCCGATATA CTGTTTGCCT TCCTAATGCT GATGATTTTC 

4 51 GTGCGTTCGT TCGACACGAA ACAAGAGCAC GGTATTTCGC CCAAACCGAC 

501 ATACAGCCGC ATCAAAGCCA ATTATTTCAG CTTCGGTTAT TTTGTCGGAC 

10 551 GCGTGTTGCC GTATCAGTTG TTTGATTTAA GCAGGATTCC CGCCTTTAAG 

601 CAGCCTGCTC CAAGCAAAAT CGGGCAGGGC AGTGTTCAAA ATATCGTCCT 

651 GATTATGGGC GAAAGCGAAA GCGCGGCGCA TTTGAAGCTG TTTGGCTACG 

701 GACGCGAAAC TTCGCCGTTT TTAACCCGGC TGTCGCAAGC CGATTTTAAG 

751 CCGATTGTGA AACAAAGTTA TTCCGCAGGC TTTATGACTG CAGTGTCCCT 

15 801 GCCCAGTTTT TTCAATGCGA TACCGCACGC CAACGGCTTG GAACAAATCA 

851 GCGGCGGCGA TACCAATATG TTCCGCCTCG CCAAAGAGCA GGGCTATGAA 

901 ACGTATTTTT ACAGCGCGCA GGCGGAAAAC GAGATGGCGA TTTTGAACTT 

951 AATCGGTAAG AAATGGATAG ACCATCTGAT TCAGCCGACG CAACTTGGCT 

1001 ACGGCAACGG CGACAATATG CCCGATGAGA AGCTGCTGCC GTTGTTCGAC 

20 1051 AAAATCAATT TGCAGCAGGG CAAGCATTTT AT CGTGTT GC ACCAACGCGG 

1101 TTCGCACGCC CCATACGGCG CATTGTTGCA GCCTCAAGAT AAAGTATTCG 

1151 GCGAAGCCGA TATTGTGGAT AAGTACGACA ACACCATCCA CAAAACCGAC 

1201 CAAATGATTC AAACCGTATT CGAGCAGCTG CAAAAGCAGC CTGACGGCAA 

1251 CTGGCTGTTT GCCTATACCT CCGATCATGG CCAGTATGTT CGCCAAGATA 

25 1301 TCTACAATCA AGGCACGGTG CAGCCCGACA GCTATCTCGT GCCGCTAGTG 

1351 TTGTACAGCC CGGATAAGGC CGTGCAACAG GCTGCCAACC AGGCTTTTGC 

1401 GCCTTGCGAG ATTGCCTTCC ATCAGCAGCT TTCAACGTTC CTGATTCACA 

1451 CGTTGGGCTA CGATATGCCG GTTTCAGGTT GTCGCGAAGG CTCGGTAACG 

1501 GGCAACCTGA TTACGGGTGA TGCAGGCAGC TTGAACATTC GCGACGGCAA 

30 1551 GGCGGAATAT GTTTATCCGC AATGA 

This corresponds to the amino acid sequence <SEQ ID 306; ORF81-l>: 

1 MKKSFLTLVL YSSLLTASEI AYRFVFGIET LPAAKIAETF ALTFVIAALY 

51 LFARYKVTRL LIAVFFAFSI IANNVHYAVY QSWMTGINYW LMLKEVTEVG 

101 SAGASMLDKL WLPVLWGVLE VMLFCSLAKF RRKTHFSADI LFAFLMLMIF 

35 151 VRSFDTKQEH GISPKPTYSR IKANYFSFGY FVGRVLPYQL FDLSRIPAFK 

201 QPAPSKIGQG SVQNIVLIMG ESESAAHLKL FGYGRETSPF LTRLSQADFK 

251 PIVKQSYSAG FMTAVSLPSF FNAIPHANGL EQISGGDTNM FRLAKEQGYE 

3 01 TYFYSAQAEN EMAILNLIGK KWIDHLIQPT QLGYGNGDNM PDEKLLPLFD 
351 KINLQQGKHF IVLHQRGSHA PYGALLQPQD KVFGEADIVD KYDNTIHKTD 

40 4 01 QMIQTVFEQL QKQPDGNWLF AYTSDHGQYV RQDIYNQGTV QPDSYLVPLV 

4 51 LYSPDKAVQQ AANQAFAPCE IAFHQQLSTF LIHTLGYDMP VSGCREGSVT 
501 GNLITGDAGS LNIRDGKAEY VYPQ* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 
45 ORF81 shows 84.7% identity over a 85aa overlap and 99.2% identity over a 121aa overlap with 
an ORF (ORF81a) from strain A of N. meningitidis: 

10 20 30 40 50 60 

orf 81 .pep MKKSFLTLVLYSSLLTAS EIAYPLELGIETLPAAK IAETFALTFVIAALYLF ARNKVTRL 
I | | | : : : I I I I I I : : I I I I I I I I I : I I ! i M I i I I M I I I I I I I : I I I 

50 orf 8 la MKKSLFVLFLYSSLLTASEIAYRFVFGIETLPAAKMAETFALTFVIAALYLFARYKATRL 



L I AVFFAFS I I ANNVH YADYQS WMT 



// 

120 130 140 

QTVFEQLQKT PDGNWL FAYT S DHGQYVRQ D 
I 1 I I I I I I I I I I I I II I I 1 I I I I I I I I I I 
IPHANGLEQISGGDIVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLFAYTSDHGQYVRQD 
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280 290 300 310 320 330 

150 160 170 180 190 200 

IYNQGTVQPDSYLVPLVLYSPDBCAVQQAANQAFAPCEIAFHQQLSTFLIHTLGYDMPVSG 

| | | | | | | M I I I I I I I I M I M II I I I I I I I I M 1 II I I M I [ M I I I I I I I 

IYNOGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTFLIHTLGYDMPVSG 
340 350 360 370 380 390 

210 220 230 

CREGSVTGNLITGDAGSLNIRDGKAEYVYPQX 
I I I I I I II I I I I I I I I I I I I I I I I I I I M II I 
CREGSVTGNLITGDAGSLNIRDGKAEYVYPQX 
400 410 420 



The complete length ORF81a nucleotide sequence <SEQ ID 307> is: 



51 
101 
151 
201 
251 



1 ATGAAAAAAT CCCTTTTCGT TCTCTTTCTG TATTCGTCCC TACT TACT GC 
CAGCGAAATT GCTTATCGCT TTGTATTCGG AATTGAAACC TTACCGGCTG 
CAAAAATGGC AGAAACGTTT GCGCTGACAT TTGTGATTGC TGCGCTGTAT 
CTGTTTGCGC GTTATAAGGC AACGCGTTTG TTGATTGCGG TGTTTTTCGC 
GTTCAGCATT AT T GC C AAC A ATGTGCATTA CGCGGTTTAT CAAAGCTGGA 
TAACGGGCAT TAATTATTGG CTGATGCTGA AAGAGATTAC CGAAGTTGGC 
301 GGCGCAGGGG CGTCGATGTT GGATAAGTTG TGGCTGCCTG CGTTGTGGGG 
351 CGTGTTGGAA GTCATGTTGT TTTGCAGCCT TGCCAAGTTC CGCCGTAAGA 
401 CGCATTTTTC TGCCGATATA CTGTTTGCCT TCCTAATGCT GATGATTTTC 
451 GTGCGTTCGT TCGACACGAA ACAAGAACAC GGTATTTCGC CCAAACCGAC 
501 ATACAGCCGC ATCAAAGCCA ATTATTTCAG CTTCGGTTAT TTTGTCGGAC 
551 GCGTGTTGCC GTATCAGTTG TTTGATTTAA GCAAGATTCC TGTGTTCAAA 
601 CAGCCTGCTC CAAGCAGAAT CGGGCAAGGC AGTATTCAAA ATATCGTCCT 
651 GATTATGGGC GAAAGCGAAA GCGCGGCGCA TTTGAAATTG TTTGGCTACG 
7 01 GGCGCGAAAC TTCGCCGTTT TTGACCCAGC TTTCGCAAGC CGATTTTAAG 
7 51 CCGATTGTGA AACAAAGTTA TTCCGCAGGC TTTATGACGG CAGTATCCCT 
801 GCCCAGTTTC TTTAACGTCA TACCGCATGC CAACGGCTTG GAACAAATCA 
851 GCGGCGGCGA TATTGTGGAT AAGTACGACA ACACCATCCA CAAAACCGAC 
901 CAAATGATTC AAACCGTATT CGAGCAGCTG CAAAAGCAGC CTGACGGCAA 
951 CTGGCTGTTT GCCTATACCT CCGATCATGG CCAGTATGTT CGCCAAGATA 
1001 TCTACAATCA AGGCACGGTG CAGCCCGACA GCTATCTCGT GCCGCTGGTG 
1051 TTGTACAGCC CGGATAAGGC CGTGCAACAG GCTGCCAACC AGGCTTTTGC 
1101 GCCTTGCGAG ATTGCCTTCC ATCAGCAGCT TTCAACGTTC CTGATTCACA 
1151 CGTTGGGCTA CGATATGCCG GTTTCAGGTT GTCGCGAAGG CTCGGTAACG 
1201 GGCAACCTGA TTACGGGTGA TGCAGGCAGC TTGAACATTC GCGACGGCAA 
1251 GGCGGAATAT GTTTATCCGC AATGA 

This encodes a protein having amino acid sequence <SEQ ID 308>: 

1 MKKSLFVLFL YSSLLTAS EI AYRFVFGIET LPAAK MAETF ALTFVIAALY 

51 L FAR YKAT R L LIAVFFAFSI IANNVH YAVY QSWITGINYW LMLKE I TEVG 

101 GAGASMLDKL W LPALWGVLE VMLFCSLA KF RRKT HFSADI LFAFLMLMIF 

151 VRSFDTKQEH GISPKPTYSR IKANYFSFGY FVGRVLPYQL FDLSKIPVFK 

201 QPAPSRIGQG SIQNIVLIMG ESESAAHLKL FGYGRETSPF LTQLSQADFK 

251 PIVKQSYSAG FMTAVSLPSF FNVIPHANGL EQISGGDIVD KYDNTIHKTD 

301 QMIQTVFEQL QKQPDGNWLF AYTSDHGQYV RQDIYNQGTV QPDSYLVPLV 

351 LYSPDKAVQQ AANQAFAPCE IAFHQQLSTF LIHTLGYDMP VSGCREGSVT 

401 GNLITGDAGS LNIRDGKAEY VYPQ* 

ORF81a and ORF81-1 show 77.9% identity in 524 aa overlap: 

10 20 30 40 50 60 

orf 8 la . pep MKKSLFVLFLYSSLLTASEIAYRFVFGIETLPAAKMAETFALTFVIAALYLFARYKATRL 

orf81-l MKKSFLTLVLYSSLLTASEIAYRFVFGIETLPAAKIAETFALTFVIAALYLFARYKVTRL 



70 80 90 100 110 120 

orf 8 la. pep LIAVFFAFSIIANNVHYAVYQSWITGINYWLMLKEITEVGGAGASMLDKLWLPALWGVLE 
I I I I I I I I I I I I I I M I II I [ II : I I I I I I I II II : I I I I : I I I I I I I I II I I : I I II I I 
orf 81-1 LIAVFFAFSIIANNVHYAVYQSWMTGINYWLMLKEVTEVGSAGASMLDKLWLPVLWGVLE 

70 80 90 100 110 120 
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orf 81a. pep 
orf81-l 



130 140 150 160 170 180 

VMLFCSLAKFRRKTHFSADILFAFLMLMIFVRSFDTKQEHGISPKPTYSRIKANYFSFGY 
I I I I I I i I I I I I I I I I I ( I I I I I I I I i I I M M M II M I I I I I I I ! I 11 II I I I II I I I 
VMLFCSLAKFRRKTHFSADILFAFLMLMIFVRSFDTKQEHGISPKPTYSRIKANYFSFGY 

130 140 150 160 170 180 

190 200 210 220 230 240 

FVGRVLPYQLFDLSKI PVFKQPAPSRIGQGS I QNI VLIMGE SE SAAHLKLFGYGRET SPF 
I I II II I I I I I I I I : II : I i I II I I : II I I I : I I II I I 1 I I I I I I I I M I I I I I I I I I II 
FVGRVLPYQLFDLSRIPAFKQPAPSKIGQGSVQNIVLIMGESESAAHLKLFGYGRETSPF 

190 200 210 220 230 240 

250 260 270 280 

LTQLSQADFKPIVKQSYSAGFMTAVSLPSFFNVIPHANGLEQISGGD 

I I : I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I : I I I I I I I I I I I I II 
LTRLSQADFKPIVKQSYSAGFMTAVSLPSFFNAIPHANGLEQISGGDTNMFRLAKEQGYE 

250 260 270 280 290 300 



30 



290 300 310 320 

orf 81a . pep IVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLF 

I I I I II I I I I I I I I I I I I I II I II ! I I I II I I I 
orf 81-1 IVLHQRGSHAPYGALLQPQDKVFGEADIVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLF 
370 380 390 400 410 420 

330 340 350 360 370 380 

orf 81a. pep AYTSDHGQYVRQDIYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTF 
M I M M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I II II || || I I I | 
orf81-l AYT S DHGQYVRQDI YNQGTVQPDSYLVPLVLYS PDKAVQQAANQAFAPCE IAFHQQLSTF 

35 430 440 450 460 470 480 

390 400 410 420 

orf 81a . pep LIHTLGYDMPVSGCREGSVTGNLITGDAGSLNIRDGKAEYVYPQX 
.„ I I I I I I I I I I I I M I I I I I I I I I I I I [ I I I I I I II I I I M II II I 

40 orf 8 1-1 LIHTLGYDMPVSGCREGSVTGNLITGDAGSLNIRDGKAEYVYPQX 

490 500 510 520 

Homology with a predicted ORF from N. gonorrhoeae 

The aligned aa sequences of ORF81 and a predicted ORF (ORF81.ng) from N. gonorrhoeae of the 
45 N- and C-termini show 82.4 % and 97.5% identity in 85 and 121 overlap, respectively: 

orf 81 . pep MKKSFLTLVLYSSLLTASEIAYPLELGIETLPAAKIAETFALTFVIAALYLFARNKVTRL 60 

I I I I : I I I I I I I I I I I I I I : : II I I I I I I | : II II I I I I : I | | | | | | | | | :: | | 
orf81ng MKKSLFVLFLYSSLLTASEIAYRFVFGIETLPAAKMAETFALTFMIAALYLFARYKASRL 60 

50 orf 81. pep LIAVFFAFSIIANNVHYADYQSWMT 85 

i I f N I I I I : I M M M I MINI 

or f 8 lng LIAVFFAFSMIANNVHYAVYQSWMTGINYWLMLKEVTEVGSAGASMLDKLWLPALWGVAE 120 

// 

orf 81. pep QTVFEQLQKTPDGNWLFAYTSDHGQYVRQD 433 

55 I I I I I i I I I I I I I I I I I I I I I I I I I [ I I I 

orf 8 lng ALLQPQDKVFGEADIVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLFAYTSDHGQYVRQD 433 

orf81 .pep I YNQGTVQPDSYLVPLVLYS PDKAVQQAANQAFAPCE I AFHQQLSTFLIHTLGYDMPVSG 493 
, n I I I M M II I I I : I I I I I I I I I I I I I I | | I | | | | | | | | | | | | | | | | | | | | | | | || |] || | 

° u orf81ng I YNQGTVQPDSYIVPLVLYS PDKAVQQAANQAFAPCE IAFHQQLSTFLIHTLGYDMPVSG 4 93 

orf 81. pep CREGSVTGNLITGDAGSLNIRDGKAEYVYPQ 524 

M M I I I I I I I I II I I I I I I I : I I I I I I I I | 
orf 8 lng CREGSVTGNLITGDAGSLNIRNGKAEYVYPQ 524 



65 The complete length ORF81ng nucleotide sequence <SEQ ID 309> is: 
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51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 



ATGAAAAAAT 
CAGCGAAAT C 
CAAAAATGGC 
CTGTTTGCGC 
GTTCAGCATG 
TGACGGGTAT 
AGCGCGGGCG 
CGTGGCGGAA 
CGCATTTTTC 
GTGCGTTCGT 
ATACAGCCGC 
GCGTGTTGCC 
CAGCCTGCTC 
GAT TAT GG G C 
GGCGCGAAAC 
CCGATTGTGA 
GCCCAGTTTC 
GCGGCGGCGA 
ACGTATTTTT 
AATCGGTAAG 
ACGGCAACGG 
AAAATCAATT 
TTCGCACGCC 
GCGAAGCCGA 
CAAATGATTC 
CTGGCTGTTT 
TCTACAATCA 
TTGTACAGCC 
GCCTTGCGAG 
CGTTGGGCTA 
GGCAACCTGA 
GGCGGAATAT 



CCCTTTTCGT 
GCCTATCGCT 
GGAAACGTTT 
GTTATAAGGC 
ATTGCCAACA 
TAACTATTGG 
CGTCGATGTT 
GTCATGTTGT 
TGCCGATATA 
TCGACACGAA 
AT CAAAGCCA 
GTATCAGTTG 
CAAGCAAAAT 
GAAAGCGAAA 
TTCGCCGTTT 
AACAAAGTTA 
TTTAACGTCA 
TACCAATATG 
ACAGTGCCCA 
AAATGGATAG 
CGACAATATG 
TGCAGCAGGG 
CCATACGGCG 
TATTGTGGAT 
AAACCGTATT 
GCCTATACCT 
AGGCACGGTG 
CGGATAAGGC 
ATTGCCTTCC 
CGATATGCCG 
TTACGGGCGA 
GTTTATCCGC 



TCTCTTTCTG 
TTGTATTCGG 
GCGCTGACAT 
TTCGCGGCTG 
ATGTGCATTA 
CTGATGCTGA 
GGATAAGTTG 
TTTGCAGCCT 
CTGTTTGCCT 
ACAAGAGCAC 
ATTATTTCAG 
TTTGATTTAA 
CGGGCAAGGC 
GCGCGGCGCA 
TTAACCCGGC 
TTCCGCAGGC 
TACCGCACGC 
TTCCGCCTCG 
GGCTGAAAAC 
ACCATCTGAT 
CCCGATGAGA 
CAGGCATTTT 
CATTGTTGCA 
AAGTACGACA 
CGAGCAGCTG 
CCGATCATGG 
CAGCCCGACA 
CGTGCAACAG 
AT CAGCAGCT 
GTTTCAGGTT 
TGCAGGCAGC 
AATAA 



TATTCATCCC 
AATTGAAACC 
TTATGATTGC 
CTGATTGCGG 
CGCGGTTTAT 
AAGAGGTTAC 
TGGCTGCCTG 
TGCCAAGTTC 
TCCTAATGCT 
GGTATTTCGC 
CTTCGGTTAT 
GCAAGATCCC 
AGTATTCAAA 
TTTGAAATTG 
TGTCGCAAGC 
TTTATGACGG 
CAACGGCTTG 
CCAAAGAGCA 
CAAATGGCAA 
TCAGCCGACG 
AGCTGCTGCC 
ATCGTGTTGC 
GCCTCAAGAT 
ACACCATCCA 
CAAAAGCAGC 
CCAGTATGTG 
GCTATATTGT 
GCTGCCAACC 
TTCAACGTTC 
GTCGCGAAGG 
TTGAACATTC 



TACTTACCGC 
TTACCGGCTG 
TGCGCTGTAT 
TGTTTTTCGC 
CAAAGCTGGA 
CGAAGTCGGC 
CTTTGTGGGG 
CGCCGTAAGA 
GATGATTTTC 
CCAAACCGAC 
TTTGTCGGGC 
TGTGTTCAAA 
ATATCGTCCT 
TTTGGTTACG 
CGATTTTAAG 
CAGTATCCCT 
GAACAAATCA 
GGGCTATGAA 
TTTTGAACTT 
CAACTTGGCT 
GTTGTTCGAC 
ACCAACGCGG 
AAAGTATTCG 
CAAAACCGAC 
CTGACGGCAA 
CGCCAAGATA 
GCCTCTGGTT 
AGGCTTTTGC 
CTGATTCACA 
CTCGGTAACA 
GCAACGGCAA 



This encodes a protein having amino acid sequence <SEQ ID 310>: 



1 MKKSLFVLFL YSSLLTASEI AYRFVFGIET LPAAKMAETF ALTFMIAALY 

35 51 LFARYKASRL LIAVFFAFSM IANNVH YAVY QSWMTGINYW LMLKEVTEVG 

101 SAGASMLDKL W LPALWGVAE VMLFCSLA KF RRKT HFSADI LFAFLMLMIF 

151 VRSFDTKQEH GISPKPTYSR IKANYFSFGY FVGRVLPYQL FDLSKIPVFK 

201 QPAPSKIGQG SIQNIVLIMG ESESAAHLKL FGYGRETSPF LTRLSQADFK 

251 PIVKQSYSAG FMTAVSLPSF FNVIPHANGL EQISGGDTNM FRLAKEQGYE 

40 301 TYFYSAQAEN QMAILNLIGK KWIDHLIQPT QLGYGNGDNM PDEKLLPLFD 

351 KINLQQGRHF IVLHQRGSHA PYGALLQPQD KVFGEADIVD KYDNTIHKTD 

401 QMIQTVFEQL QKQPDGNWLF AYTSDHGQYV RQDIYNQGTV QPDSYIVPLV 

451 LYSPDKAVQQ AANQAFAPCE IAFHQQLSTF LIHTLGYDMP VSGCREGSVT 

501 GNLITGDAGS LNIRNGKAEY VYPQ* 

45 ORF81ng and ORF81-1 show 96.4% identity in 524 aa overlap: 



MKKSLFVLFLYSSLLTASEIAYRFVFGIETLPAAKMAETFALTFMIAALYLFARYKASRL 
I I I I : : : I I I I I I I I I I I I I I I I I I I I M I I I I I : I I I I I I I I : I I I I I I I I i I 1 :: I I 
MKKS FLTLVLYS S LLTASE IAYRFVFGIETLPAAKIAET FALT FVI AALYLFARYKVTRL 



70 80 90 100 110 120 

or f 8 lng-1 . pep LIAVFFAFSMIANNVHYAVYQSWMTGINYWLMLKEVTEVGSAGASMLDKLWLPALWGVAE 

cc I M I I I M I : N I I I II I M I I M I I I! I I I ! I I I M I I M I I I I ] I | | | | | [: | | | i | 

JJ orf 81-1 LIAVFFAFSIIANNVHYAVYQSWMTGINYWLMLKEVTEVGSAGASMLDKLWLPVLWGVLE 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 8 lng-1 .pep VMLFCSLAKFRRKTHFS ADI LFAFLMLMI FVRS FDTKQEHG I S PKPTYSRIKANYFS FGY 
6U I i I I i I I I I I I I M I I I I I II I I I I I I I I I I | i I | | | | | | | | | | | | | | | | M I I I I I I I I 

orf 81-1 VMLFCSLAKFRRKTHFSADI LFAFLMLMI FVRSFDTKQEHGIS PKPTYSRIKANYFS FGY 

!30 140 150 160 170 180 

At . 190 200 210 220 230 240 
0:5 orf8 lng-1 .pep FVGRVLPYQLFDLSKIPVFKQPAPSKIGQGSIQNIVLIMGESESAAHLKL FGYGRETSPF 
I I 1 M I M II I I I I : I I : I I I I I I I : M I I II I I I I I I I I I I M I I I 
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250 260 270 280 290 300 

orf 81ng-l . pep LTRLSQADFKPIVKQSYSAGFMTAVSLPSFFNVTPHANGLEQISGGDTNMFRLAKEQGYE 
I I I I I I I I I I I II 1 I I I I I I I I I I I I 1 1 I I I I : I I I I I I I I I I I I I I I I I I I I I I I M I I 
orf 81-1 LTRLSQADFKPIVKQSYSAGFMTAVSLPSFFNAIPHANGLEQISGGDTNMFRLAKEQGYE 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 81ng-l.pep TYFYSAQAENQMAILNLIGKKWIDHLIQPTQLGYGNGDNMPDEKLLPLFDKINLQQGRHF 
I II I II I I I 1 : I II I I I I I ! I I II I II II II II I M M I I M N I I I I M M II M I : M 
orf 81-1 TYFYSAQAENEMAILNLIGKKWIDHLIQPTQLGYGNGDNMPDEKLLPLFDKINLQQGKHF 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 81ng-l . pep IVLHQRGSHAPYGALLQPQDKVFGEADIVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLF 
I I I II I I M I I I I II I I I II I I I I I I I I I I I II I I I I I I I I I I I I II 1 I I I I I I I II II I 
orf 81-1 IVLHQRGSHAPYGALLQPQDKVFGEADIVDKYDNTIHKTDQMIQTVFEQLQKQPDGNWLF 

370 380 390 400 410 420 

430 440 450 460 470 480 

orf 81ng-l.pep AYT S DHGQ YVRQD I YNQGTVQP DS YI VPLVLY S P DKAVQQAANQAFAPCE I AFHQQL S T F 
M I I I I I I I ! I I I I I I ! I I I ! I I II : I I I I I I I I I I I I I I I I I I I [ I I I I I I I I I I I I M 
orf 81-1 AYTSDHGQYVRQDIYNQGTVQPDSYLVPLVLYSPDKAVQQAANQAFAPCEIAFHQQLSTF 

430 440 450 460 470 480 



orf 81ng-l .pep 
orf81-l 



490 500 510 520 

LIHTLGYDMPVSGCREGSVTGNLITGDAGSLNIRNGKAEYVYPQX 



Furthermore, ORF81ng shows significant homology to anE.coli OMP: 

■ membrane adherence protein-associated protein [E. 



gi I 1256380 (U50906) oute 

coli] Length = 547 
Score =87.4 bits (213), Expect = 2e-16 
Identities = 122/468 (26%), Positives = 19 



i/468 (42%), Gaps = 70/468 (14%) 



Query: 


25 


Sbjct: 


29 




82 


Sbjct: 


87 




135 


Sbjct: 


142 


Query: 


184 


Sbjct: 


202 


Query: 


242 


Sbjct: 


258 




299 


Sbjct: 


311 




356 


Sbjct: 


360 




413 



VFGIETLPAAKMAETFA-LTFMIAALYLFARYKAS— RLL I AVFFAFSMI ANNVHYAVYQ 81 
VFGI LA+A LF+++R + RLL+A F + A ++ ++Y 

VFGITNLVASSGAHMVQRLLFFVLT ILWKRI S SLPLRLLVAAP FVL— LTAADMS I SLY— 8 6 

SWMT GINYWLMLKEVTEVGSAGASMLDKLWLPALWGVAEVMLFCSLAKFRRKT 134 

SW T G ++ + EV A ML ++ PL A + L + 
SWCTFGTTFNDGFAI SVLQSDPDEV AKMLG-MYSPYLCAFAFLSLLFLAVIIKYDV 141 



2— LFDLSKIPVFKQPAPSKIGQGSIQNIVLIMGESESAAHLKLFGYGRETSPFL 241 
2 L + +P F+ + I VLI+GES ++ L+GY R T+P + 

2 02 AAKEHQRLLS I ANTVPYFQL SVRDTGIDTYVLIVGESVRVDNMSLYGYTRSTTPQV 257 

TRLSQADFKPIVKQSYSAGFMTAVSLP S FFNVI PHANGLEQI S GGDTNMFRLAKEQG 2 98 

+Q + Q+ S TA+S+P + +V+ H I N+ +A + G 

E — AQRKQIKLFNQAISGAPYTALSVPLSLTADSVLSH DIHNYPDNI INMANQAG 310 

YETYFYSAQA ENQMAILNLIGKKWIDHLIQPTQLGYGNGDNMPDEKLLPLFDKINLQ 355 

++T++ S+Q+ +N A+ ++ ++ -f y G DE LLP + Q 
FQT FWL S SQSAFRQNGT AVT SI AMRAMETVYVRGF DELLLPHLS QALQQ 359 

— QGRHFIVLHQRGSHAPYGALLQPQDKVFGEADIVDK-YDNTIHKTDQMIQTVFEQLQK 412 
Q + IVLH GSH P + VF D D YDN+IH TD ++ VFE L+ 
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Sbjct: 419 --DRRASVMYFADHGLERDPTKKNVYFHGGREASQQAYHVPMFIWYSP 464 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 37 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 3 1 1>: 

1 . . -ACCCTGCTCC TCTTCATCCC CCTCGTCCTC ACAC.GTGCG GCACACTGAC 

51 CGGCATACTC GCCCaCGGCG GCGGCAAACG CTTTGCCGTC GAACAAGAAC 

101 TCGTCGCCGC ATCGTCCCGC GCCGCCGTCA AAGAAATGGA TTTGTCCGCC 

151 yTAAAAGGAC GCAAAGCCGC CyTTTACGTC TCCGTTATGG GCGACCAAGG 

201 TTCGGGCAAC ATAAGCGGCG GACGCTACTC TATCGACGCA CTGATACGCG 

251 GCGGCTACCA CAACAACCCC GAAAGTGCCA CCCAATACAG CTACCCCGCC 

301 TACGACACTA CCGCCACCAC CAAATCCGAC GCGCTCTCCA GCGTAACCAC 

351 TTCCACATCG CTTTTGAACG CCCCCGCCGC CGyCyTGACG AAAAACAGCG 

4 01 GACGCAAAGG CGAACGcTCC GCCGGACTGT CCGTCAACGG CACGGGCGAC 

4 51 TACCGCAACG AAACCCTGCT CGCCAACCCC CGCGACGTTT CCTTCCTGAC 

501 CAACCTCATC CAAACCGTCT TCTACCTGCG CGGCATCGAA GTCgTACCGC 

551 CCGrATACGC CGACACCGAC GTATTCGTAA CCGTCGACGT A. . . 

This corresponds to the amino acid sequence <SEQ ID 312; ORF83>: 

1 . . TLLLFIPLVL TXCGT LTGIL AHGGGKRFAV EQELVAASSR AAVKEMDLSA 
51 LKGRKAAXYV SVMGDQGSGN ISGGRYSIDA LIRGGYHNNP ESATQYSYPA 
101 YDTTATTKSD ALSSVTTSTS LLNAPAAXLT KNSGRKGERS AGLSVNGTGD 
151 YRNETLLANP RDVSFLTNLI QTVFYLRGIE WPPXYADTD VFVTVDV. . 

Further work revealed the complete nucleotide sequence <SEQ ID 3 13>: 

1 ATGAAAACCC TGCTCCTCCT CATCCCCCTC GTCCTCACAG CCTGCGGCAC 

51 ACTGACCGGC ATACCCGCCC ACGGCGGCGG CAAACGCTTT GCCGTCGAAC 

101 AAGAACT CGT CGCCGCATCG TCCCGCGCCG CCGTCAAAGA AATGGATTTG 

151 TCCGCCCTAA AAGGACGCAA AGCCGCCCTT TACGTCTCCG TTATGGGCGA 

2 01 CCAAGGTTCG GGCAACATAA GCGGCGGACG CTACTCTATC GACGCACTGA 

2 51 TACGCGGCGG CTACCACAAC AACCCCGAAA GTGCCACCCA ATACAG CTAC 

301 CCCGCCTACG ACACTACCGC CACCACCAAA TCCGACGCGC TCTCCAGCGT 

351 AACCACTTCC ACATCGCTTT TGAACGCCCC CGCCGCCGCC CTGACGAAAA 

4 01 ACAGCGGACG CAAAGGCGAA CGCTCCGCCG GACTGTCCGT CAACGGCACG 

451 GGCGACTACC GCAACGAAAC CCTGCTCGCC AACCCCCGCG ACGTTTCCTT 

501 CCTGACCAAC CTCATCCAAA CCGTCTTCTA CCTGCGCGGC ATCGAAGTCG 

551 TACCGCCCGA ATACGCCGAC ACCGACGTAT TCGTAACCGT CGACGTATTC 

601 GGCACCGTCC GCAGCCGTAC CGAACTGCAC CTCTACAACG CCGAAACCCT 

651 TAAAGCCCAA ACCAAGCTCG AATATTTCGC CGTTGACCGC GACAGCCGGA 

7 01 AACTGCTGAT TACCCCTAAA ACCGCCGCCT ACGAATCCCA ATACCAAGAA 

7 51 CAATACGCCC TTTGGACCGG CCCTTACAAA GTCAGCAAAA CCGTCAAAGC 

8 01 CTCAGACCGC CTGATGGTCG ATTTCTCCGA CATTACCCCC TACGGCGACA 
851 CAACCGCCCA AAACCGTCCC GACTTCAAAC AAAACAACGG TAAAAAACCC 
901 GATGTCGGCA ACGAAGTCAT CCGCCGCCGC AAAGGAGGAT AA 



This corresponds to the amino acid sequence <SEQ ID 314; ORF83-l>: 



101 



MKTLLLLIPL VLTA CGTLTG I PAHGGGKRF AVEQELVAAS SRAAVKEMDL 

SALKGRKAAL YVSVMGDQGS GNISGGRYSI DALIRGGYHN NPESATQYSY 

PAYDTTATTK SDALSSVTTS TSLLNAPAAA LTKNSGRKGE RSAGLSVNGT 

151 GDYRNETLLA NPRDVSFLTN LIQTVFYLRG IEWPPEYAD TDVFVTVDVF 

201 GTVRSRTELH LYNAETLKAQ TKLEYFAVDR DSRKLLITPK TAAYESQYQE 

251 QYALWTGPYK VSKTVKASDR LMVDFSDITP YGDTTAQNRP DFKQNNGKKP 
301 DVGNEVIRRR KGG* 
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Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted OKF from N. meningitidis (strain A) 

ORF83 shows 96.4% identity over a 197aa overlap with an ORF (ORF83a) from strain A of N. 
meningitidis: 



TLLL FIPLVLTX CGTLTGILAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAX 
Ml : | | | | | I I I I I I I I I I I M I I I I I I I I I M I I I I 1 I I I I I I I I I I M I M I 
MKTLLXLIPLVLTACGTLTGIPAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAL 



10 



20 



30 



40 



50 



110 



60 



60 70 80 90 100 

YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 

M I I I I I I I I I I I I II I I I I I M I I M I I I I I I I II I I I I I I I I I I I I I I I 1 I ! I ! I I I I 
YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 
70 80 90 100 110 120 

120 130 140 150 160 170 

TSLLNAPAAXLTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 
I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II M I I I I I I I I I I I I I I I I I 
TSLLNAPAAALTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 
130 140 150 160 170 180 



180 190 
orf 83 . pep IEVVPPXYADTDVFVTVDV 
MINI I I I I I I II I I I I 

orf 83a IEVVPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLIAPK 
190 200 210 220 230 240 

The complete length ORF83a nucleotide sequence <SEQ ID 315> is: 

1 ATGAAAACCC TGCTCNTCCT CATCCCCCTC GTCCTCACAG CCTGCGGCAC 

51 ACTGACCGGC ATACCCGCCC ACGGCGGCGG CAAACGCTTT GCCGTCGAAC 

101 AAGAACTCGT CGCCGCATCG TCCCGCGCCG CCGTCAAAGA AATGGACTTG 

151 TCCGCCCTGA AAGGACGCAA AGCCGCCCTT TACGTCTCCG TTATGGGCGA 

2 01 CCAAGGTTCG GGCAACATAA GCGGCGGACG CT ACT CTAT C GACGCACTGA 
251 TACGCGGCGG CTACCACAAC AACCCCGAAA GTGCCACCCA ATACAGCTAC 

3 01 CCCGCCTACG ACACTACCGC CACCACCAAA TCCGACGCGC TCTCCAGCGT 
351 AACCACTTCC ACATCGCTTT TGAACGCCCC CGCCGCCGCC CTGACGAAAA 

4 01 ACAGCGGACG CAAAGGCGAA CGCTCCGCCG GACTGTCCGT CAACGGCACG 
4 51 GGCGACTACC GCAACGAAAC CCTGCTCGCC AACCCCCGCG ACGTTTCCTT 
501 CCTGACCAAC CTCATCCAAA CCGTCTTCTA CCTGCGCGGC ATCGAAGTCG 
551 TACCGCCCGA ATACGCCGAC ACCGACGTAT TCGTAACCGT CGACGTATTC 
601 GGCACCGTCC GCAGCCGCAC CGAACTGCAC CTCTACAACG CCGAAACCCT 
651 TAAAGCCCAA ACCAAGCTCG AATATTTCGC CGTTGACCGC GACAGCCGGA 
7 01 AACTGCTGAT TGCCCCTAAA ACCGCCGCCT ACGAATCCCA ATACCAAGAA 
751 CAATACGCCC TCTGGATGGG ACCTTACAGC GTCGGCAAAA CCGTCAAAGC 
801 CTCAGACCGC CTGATGGTCG ATTTCTCCGA CATCACCCCC TACGGCGACA 
851 CAACCGCCCA AAACCGTCCC GACTTCAAAC AAAACAACGG TAAAAAACCC 
901 GATGTCGGCA ACGAAGT CAT CCGCCGCCGC AAAGGAGGAT AA 

This encodes a protein having amino acid sequence <SEQ ID 316>: 



1 MKTLLXLIPL VLTA CGTLTG IPAHGGGKRF AVEQELVAAS SRAAVKEMDL 

51 SALKGRKAAL YVSVMGDQGS GNISGGRYSI DALIRGGYHN NPESATQYSY 

101 PAYDTTATTK SDALSSVTTS TSLLNAPAAA LTKNSGRKGE RSAGLSVNGT 

151 GDYRNETLLA NPRDVSFLTN LIQTVFYLRG IEWPPEYAD TDVFVTVDVF 

2 01 GTVRSRTELH LYNAET LKAQ TKLEYFAVDR DSRKLLIAPK TAAYESQYQE 

2 51 QYALWMGPYS VGKTVKASDR LMVDFSDITP YGDTTAQNRP DFKQNNGKKP 

3 01 DVGNEVIRRR KGG* 

ORF83a and ORF83-1 show 98.4% identity in 313 aa overlap: 



10 20 30 40 50 60 

orf 83a . pep MKT LLXL I PLVLTACGTLTGI PAHGGGKRFAVEQELVAAS SRAAVKEMDLS ALKGRKAAL 
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| M I I II I I II II 1 I I I I I I I I M II 1 I I I I I M II I I I I M II 

MKTLLLL I PLVLTACGTLTG I PAHGGGKRFAVEQELVAAS SRAAVKEMDLS ALKGRKAAL 



10 



20 



30 



40 



60 



70 80 90 100 110 120 

orf83a oep yvSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 

| | | M I I I I I I II I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf83-l YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 
70 80 90 100 110 120 

130 140 150 160 170 180 

orf83a pep TSLLNAPAAALTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 

|| | | M I I I I I I I M M I I I I I II I M I I I I I I I I I I M I I I I I I I II I I I 

orf 83-1 TSLLNAPAAALTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 
130 140 150 160 170 180 

190 200 210 220 230 240 

orf 83a pep IEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLIAPK 

| M I I I II I I I I I I M I I I I I I I I I I M I I I II I II M M I I I I I I I I I ■• I 1 

orf 83-1 IEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKftQTKLEYFAVDRDSRKLLITPK 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 83a pep TAAYESQYQEQYALWMGPYSVGKTVKASDRLMVDFSDITPYGDTTAQNRPDFKQNNGKKP 
| | I I I I I I I I I I I I I I I I : I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M 
orf 83-1 TAAYES QYQEQYALWTGPYKVSKTVKASDRLMVDFS D IT PYGDTTAQNRP DFKQNNGKKP 

250 260 270 280 290 300 

310 

or f 8 3a . pep DVGNEVIRRRKGGX 
I I I I I I I I I I I I I I 
orf83-l DVGNEVIRRRKGGX 
310 

Homology with a predicted ORF from 7V". gonorrhoeae 

ORF83 shows 94.9% identity over a 197aa overlap with a predicted ORF (ORF83.ng) from N. 



gonorrhoeae: 



40 



orf83 .pep 
orf83ng 



TLLLFIPLVLTXCGTLTGILAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAX 58 

I | 1 | : I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I M I ! M I I 

MKTLLLLIPLVLTACGTLTGIPAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAL 60 

YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 118 
I I I I I I I I I I II II I I I I I I I I I I I I I I M I I : I I I : I I I I I I I I I I I I I I I I I I : I I I I 

YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPDSATRYSYPAYDTTATTKSDALSGVTTS 120 

TSLLNAPAAXLTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 178 
I I M I I I I I I I I I : I I I I I I I I I I II I I I I I M I I I I I I I I I I I I I I I I I I I M I I I I I 

T S LLNAPAAALTKNNGRKGERS AGLS VNGT GDYRNETLLANPRDVS FLTNL IQTVFYLRG 180 

IEWPPXYADTDVFVTVDV 197 
I I I I! I I I I I I I I I I I I 1 

IEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLIAPK 24 0 



The complete length ORF83ng nucleotide sequence <SEQ ID 317> is: 



101 
151 
201 
251 
301 
351 
401 
451 
501 



ATGAAAACCC 
ACT GACCGGC 
AGGAACTCGT 
TCCGCCCTGA 
CCAAGGTTCG 
TACGCGGCGG 
CCCGCCTATG 
AACCACTTCC 
ACAACGGACG 
GGCGACTACC 
CCTGACCAAC 



TGCTCCTCCT 
ATACCCGCCC 
CGCCGCATCG 
AAGGACGCAA 
GGCAACATAA 
CTACCACAAC 
ACACTACCGC 
ACATCGCTTT 
CAAAGGCGAA 
GCAACGAAAC 
CTCATCCAAA 



CATCCCCCTC 
ACGGCGGCGG 
TCCCGCGCCG 
AGCCGCCCTT 
GCGGCGGACG 
AACCCCGACA 
CACCACCAAA 
TGAACGCCCC 
CGCTCCGCCG 
CCTGCTCGCC 
CCGTCTTCTA 



GTACTCACCG 
CAAACGCTTT 
CCGTCAAAGA 
TACGTCTCCG 
CTACTCCATC 
GCGCCACCCG 
TCCGACGCGC 
CGCCGCCGCC 
GACTGTCCGT 
AACCCCCGCG 
CCTGCGCGGC 



CCTGCGGCAC 
GCCGTCGAAC 
AATGGACTTG 
TTATGGGCGA 
GACGCACTGA 
ATACAGCTAC 
TCTCCGGCGT 
CTGACGAAAA 
CAACGGCACG 
ACGTTTCCTT 
ATCGAAGTCG 
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551 TACCGCCCGA ATACGCCGAC ACCGACGTAT TCGTAACCGT CGACGTATTC 

601 GGCACCGTCC GCAGCCGTAC CGAACTGCAC CTCTACAACG CCGAAACCCT 

651 TAAAGCCCAA ACCAAGCTCG AATATTTCGC CGTCGACCGC GACAGCCGGA 

7 01 AACTGCTGAT TGCCCCTAAA ACCGCCGCCT ACGAATCCCA ATACCAAGAA 

751 CAATACGCCC TCTGGATGGG ACCTTACAGC GTCGGCAAAA CCGTCAAAGC 

801 CTCAGACCGC CTGATGGTCG ATTTCTCCGA CATCACCCCC TACGGCGACA 

851 CAACCGCCCA AAACCGTCCC GACTTCAAAC AAAACAACGG TAAAAACCCC 

901 GATGTCGGCA ACGAAGTCAT CCGCCGCCGC AAAGGAGGAT AA 

This encodes a protein having amino acid sequence <SEQ ID 3 1 8>: 

1 MKTL LLLIPL VLTAC GTLTG I PAHGGGKRF AVEQELVAAS SRAAVKEMDL 

51 SALKGRKAAL YVSVMGDQGS GNISGGRYSI DALIRGGYHN NPDSATRYSY 

101 PAYDTTATTK SDALSGVTTS TSLLNAPAAA LTKNNGRKGE RSAGLSVNGT 

151 GDYRNETLLA NPRDVSFLTN LIQTVFYLRG IEWPPEYAD TDVFVTVDVF 

201 GTVRSRTELH LYNAETLKAQ TKLEYFAVDR DSRKLLIAPK TAAYESQYQE 

251 QYALWMGPYS VGKTVKASDR LMVDFSDITP YGDTTAQNRP DFKQNNGKNP 

301 DVGNEVIRRR KGG* 

ORF83ng and ORF83-1 show 97.1% identity in 313 aa overlap 



40 



orf 83-1 .pep 
orf83ng 



rf 83-1. pep 

rf83ng 



orf 83-1 .pep 
orf 83ng 



MKTLLLLIPLVLTACGTLTGIPAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAL 
MKTLLLLIPLVLTACGTLTGIPAHGGGKRFAVEQELVAASSRAAVKEMDLSALKGRKAAL 



10 



70 



20 



80 



30 



40 



50 



60 



90 100 110 120 

YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPESATQYSYPAYDTTATTKSDALSSVTTS 
! I M M M M I II II II I I I I M If M 1 1 I I I : I I I : I I I I I I I I I I I I I f II ! I : | | | | 
YVSVMGDQGSGNISGGRYSIDALIRGGYHNNPDSATRYSYPAYDTTATTKSDALSGVTTS 
70 80 90 100 110 120 

130 140 150 160 170 180 

TSLLNAPAAALTKNSGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 
I 1 I M I I I I I I I I I : I I I I I I I I I I I I I II I 1 I I I I I I I I I I I I I I I I I I I I I I || I I I | 
TSLLNAPAAA1TKNNGRKGERSAGLSVNGTGDYRNETLLANPRDVSFLTNLIQTVFYLRG 

130 140 150 160 170 180 

190 200 210 220 230 240 

IEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLITPK 

I I I 1 I I I I I I I I I I II I I I II I I II 1 ||:]| 

IEWPPEYADTDVFVTVDVFGTVRSRTELHLYNAETLKAQTKLEYFAVDRDSRKLLIAPK 

190 200 210 220 230 240 

250 260 270 280 290 300 

TAAYESQYQEQYALWTGPYKVSKTVKASDRLMVDFSDITPYGDTTAQNRPDFKQNNGKKP 
I I I M I I I I I I I I I I II I : I : I I I I II N II I I I I I I I I I I I M I I I I | | | I | | | | | : j 
TAAYESQYQEQYALWMGPYSVGKTVKASDRLMVDFSDITPYGDTTAQNRPDFKQNNGKNP 

250 260 270 280 290 300 



orf 83-1 .pep 
orf 83ng 



Based on this analysis, including the presence of a putative ATP/GTP-binding site motif A (P-loop) 
in the gonococcal protein (double-underlined) and a putative prokaryotic membrane lipoprotein 
lipid attachment site (single-underlined), it is predicted that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 
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Example 38 

The following DNA sequence, believed to be complete, was identified in N. meningitidis <SEQ 
319>: 

1 AT GGCAGAGA TCTGTTTGAT AACCGGCACG CCCGGTTCAG GGAAAACATT 

51 AAAAATGGTT TCCATGATGG CGAATGATGA AATGTTTAAG CCTGATGAAA 

101 AAGCCATACG CCGTAAAGTA TTTACGAACA TAAAAGGCTT GAAAATACCG 

151 CACACCTACA TAGAAACGGA CGCAAAAAAG CTGCCGAAAT CGACAGATGA 

201 GCAGCTTTCG GCGCATGATA TGTACGAATG GATAAAGAAG CCCGAAAATA 

251 TCGGGTCTAT TGTCATTGTA GATGAAGCTC AAGACGTATG GCCGGCACGC 

301 TCGGCAGGTT CAAAAATCCC TGAAAATGTC CAATGGCTGA ATACGCACAG 

351 ACATCAGGGC ATTGATATAT TTGTTTTGAC TCAAGGTCCT AAGCTTCTAG 

401 AT CAAAAT CT TAGAACGCTT GTACGGAAAC ATTACCACAT CGCTTCAAAC 

451 AAGATGGGTA TGCGTACGCT TTTAGAATGG AAAATATGCG CGGACGATCC 

501 CGTAAAAATG GCATCAAGCG CATTCTCCAG TATCTATACA CTGGATAAAA 

551 AAGTTTATGA CTTGTAysrr TmmGCGGAAG TTCATACCGT AAATAAGGTC 

601 AAGCGGTCAA AGTGGTTTTA CACTCTGCCa GTAATAGTAT TGCTGATTCC 

651 CGTGTTTGTC GGCCTGTCCT ATAAAATGTT GagCaGTTAC GGAAAAAAAC 

7 01 aGGAAGAACC CGCAGCACAA GAATCGGCGG CAACAGAACA GCAGGCAGTA 

7 51 CTTCCGGATA AAACAGAAGG CGAGCCGGTA AATAACGGCA ACCTTACCGC 

801 AGATATGTTT GTTCCGACAT TGTCCGAaAA ACCCGrAAGC AAGCcgaTTT 

851 ATAACGGTGT AAGGCAGGTA AGAACCTTTG AATATATAGC AGGCTGTATA 

901 GAAGGCGGAA GAACCGGATG CGCCTGCTAT TCGCaTCAAG GGACGGCATt 

951 gaAAGAAGTG ACGGaGTTGA TGTGcgaAgG aCTATGTaAA AAacGGCTTG 

1001 CCGTTTAACC CaTACAAAGA AGAAAGCCAA GGGCAGGAAG TTCAGCAAAG 

1051 CGCGCAgCAA CATTCGGACA GGGCG£CAAG TTGCCACATT GGGCGGAAAA 

1101 CCGTAGCAGA ACCTAATGTA CGATAATTGG GAAGAACGCG GGAAACCGTT 

1151 TGAAGGAATC GGaCGGGGGC GTGGTCGGAT CGGCAAACTG A 

This corresponds to the amino acid sequence <SEQ ID 320; ORF84>: 

1 MAEICLITGT PGSGKT LKMV SMMANDEMFK PDEKAIRRKV FTNIKGLKIP 

51 HTYIETDAKK LPKSTDEQLS AHDMYEWIKK PENIGSIVIV DEAQDVWPAR 

101 SAGSKIPENV QWLNTHRHQG IDIFVLTQGP KLLDQNLRTL VRKHYHIASN 

151 KMGMRTLLEW KICADDPVKM ASSAFSSIYT LDKKVYDLYX XAEVHTVNKV 

201 KRSKWFYTLP VIVLLIPVFV GLSYKMLSSY GKKQE E PAAQ ESAATEQQAV 

251 LPDKTEGEPV NNGNLTADMF VPTLSEKPXS KPIYNGVRQV RTFEYIAGCI 

3 01 EGGRTGCACY SHQGTALKEV TELMCKDYVK NGLPFNPYKE ESQGQEVQQS 

351 AQQHSDRAQV ATLGGKPXQN LMY DNWEERG KPFEGIGGGV VGSAN* 

Further work revealed the complete nucleotide sequence <SEQ ID 321>: 

1 ATGGCAGAGA TCTGTTTGAT AACCGGCACG CCCGGTTCAG GGAAAACATT 

51 AAAAATGGTT TCCATGATGG CGAATGATGA AATGTTTAAG CCTGATGAAA 

101 ACGGCATACG CCGTAAAGTA TTTACGAACA TAAAAGGCTT GAAAATACCG 

151 CACACCTACA TAGAAACGGA CGCAAAAAAG CTGCCGAAAT CGACAGATGA 

201 GCAGCTTTCG GCGCATGATA TGTACGAATG GATAAAGAAG CCCGAAAATA 

251 TCGGGTCTAT TGTCATTGTA GATGAAGCTC AAGACGTATG GCCGGCACGC 

3 01 TCGGCAGGTT CAAAAATCCC TGAAAATGTC CAATGGCTGA ATACGCACAG 
351 ACATCAGGGC ATTGATATAT TTGTTTTGAC TCAAGGTCCT AAGCTTCTAG 

4 01 AT CAAAAT CT TAGAACGCTT GTACGGAAAC ATTACCACAT CGCTTCAAAC 
451 AAGATGGGTA TGCGTACGCT TTTAGAATGG AAAATATGCG CGGACGATCC 
501 CGTAAAAATG GCATCAAGCG CATTCTCCAG TATCTATACA CTGGATAAAA 
551 AAGTTTATGA CTTGTACGAA TCAGCGGAAG TTCATACCGT AAATAAGGTC 
601 AAGCGGTCAA AGTGGTTTTA CACTCTGCCA GTAATAGTAT TGCTGATTCC 
651 CGTGTTTGTC GGCCTGTCCT ATAAAATGTT GAGCAGTTAC GGAAAAAAAC 
7 01 AGGAAGAACC CGCAGCACAA GAATCGGCGG CAACAGAACA GCAGGCAGTA 

7 51 CTTCCGGATA AAACAGAAGG CGAGCCGGTA AATAACGGCA ACCTTACCGC 

8 01 AGATATGTTT GTTCCGACAT TGTCCGAAAA ACCCGAAAGC AAGCCGATTT 
8 51 ATAACGGTGT AAGGCAGGTA AGAACCTTTG AATATATAGC AGGCTGTATA 
901 GAAGGCGGAA GAACCGGATG CGCCTGCTAT TCGCATCAAG GGACGGCATT 
951 GAAAGAAGTG ACGGAGTTGA TGTGCAAGGA CTATGTAAAA AACGGCTTGC 

1001 CGTTTAACCC ATACAAAGAA GAAAGCCAAG GGCAGGAAGT TCAGCAAAGC 

1051 GCGCAGCAAC ATTCGGACAG GGCGCAAGTT GCCACATTGG GCGGAAAACC 

1101 GTAGCAGAAC CTAATGTACG ATAATTGGGA AGAACGCGGG AAACCGTTTG 

1151 AAGGAATCGG CGGGGGCGTG GTCGGATCGG CAAACTGA 
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25 



This corresponds to the amino acid sequence <SEQ ID 322; ORF84-l>: 

1 MAEICLITGT PGSGKTLKMV SMMANDEMFK PDENGIRRKV FTNIKGLKIP 

51 HTYIETDAKK LPKSTDEQLS AH DM YE W I KK PENIGSIVIV DEAQDVWPAR 

101 SAGSKIPENV QWLNTHRHQG IDIEVLTQGP KLLDQNLRTL VRKHYHIASN 

151 KMGMRTLLEW KICADDPVKM ASSAFSSIYT LDKKVY DLYE SAEVHTVNKV 

201 KRSKW FYTLP VIVLLIPVFV GL SYKMLSSY GKKQEEPAAQ ESAATEQQAV 

251 LPDKTEGEPV NNGNLTADMF VPTLSEKPES KPIYNGVRQV RTFEYIAGCI 

301 EGGRTGCACY SHQGTALKEV TELMCKDYVK NGLPFNPYKE ESQGQEVQQS 

351 AQQHSDRAQV ATLGGKP*QN LMYDNWEERG KPFEGIGGGV VGSAN* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF84 shows 93.9% identity over a 395aa overlap with an ORF (ORF84a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf84 pep MAEICLITGTPGSGKTLKMVSMMANDEMFKPDEKAIRRKVFTNIKGLKIPHTYIETDAKK 

I I I I I 1 I ! I I I ! I I I I I I I I I I I:: M I I I I I I I 1 I I I I I I I I I I M I I I 

orf 84a MAEICLITGTPGSGKTLKMVSMMANDEMFKPDENGIRRKVFTNIKGLKIPHTYIETDAKK 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf 8 4. pep LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 

I I ! M I I I M I I I I I I i I I I 1 I I I I I I M I I I I I I I I I I I I I 1 I I I I I I I 1 I I I I I I M I 

orf 84a LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDWPARSAGSKIPENVQWLNTHRHQG 
70 80 90 100 110 120 

130 140 150 160 170 180 

orf 84 pep IDIFVLTQGPKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADDPVI<MASSAFSSIYT 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 II 

orf 84a IDIFVLTQGSKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADDPVKMASSAFSSIYT 
30 130 140 150 160 170 180 

190 200 210 220 230 240 

or f 8 4 . pep LDKKVYDLYXXAEVHTVNKVKRSKW FYTLPVIVLLIPVFVGL SYKMLSSYGKKQEEPAAQ 
i M M II I I I I I I II I I I I I I I I I I II I I I : I I I I I II I I I I I I I I I I I I I I I I I M I 
35 orf 8 4a LDKKVYDLYESAEVHTVNKVKRSBCW FYTLPVIILLIPVFVGL SYKMLSSYGKKQEEPAAQ 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 84 . pep ESAATEQQAVLPDKTEGE PVNNGNLTADMFVPTLSEKPXSKPIYNGVRQVRTFEYIAGCI 

40 I I I I I I : I I I : I I I I II I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I : 

orf 84a ESAATEHQAVFQDKTEGEPVNNGNLTADMFVPTLSEKPESKPIYNGVRQVRTFEYIAGCV 
250 260 270 280 290 300 

310 320 330 340 350 360 

45 orf 84 .pep EGGRTGCACYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 

I I I I II ! : ! I I I I I I I M I : I : I I I I I :: I I I I I I I I I I I I M :: I I I I I : I I I I II 

orf 84a EGGRTGCTCYSHQGTALKEITKEMCKDYARNGLPFNPYKEESQGRDVQQSEQHHSDRPQV 
310 320 330 340 350 360 

50 370 380 390 

orf 8 4 .pep ATLGGKPXQNLMYDNWEERGKPFEGIGGGWGSANX 
I I I I I I I I I I I 1 I I I : I I I I I I I I I I I I I I I I I I I 
orf 8 4a AT LGGKPWQNLMY DNWQERGKP FE GI GGGWG S ANX 

370 380 390 

55 The complete length ORF84a nucleotide sequence <SEQ ID 323> is: 

1 ATGGCAGAGA TCTGTTTGAT AACCGGCACG CCCGGTTCAG GGAAAACATT 

51 AAAAATGGTT TCCATGATGG CAAACGATGA AATGTTTAAG CCGGATGAAA 

101 ACGGCATACG CCGTAAAGTA TTTACGAACA TCAAAGGCTT GAAGATACCG 

151 CACACCTACA TAGAAACGGA CGCGAAAAAG CTGCCGAAAT CGACAGATGA 

60 201 GCAGCTTTCG GCGCATGATA TGTACGAATG GATAAAGAAG CCCGAAAATA 
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251 TCGGGTCTAT TGTCATTGTA 

301 TCGGCAGGTT CAAAAATCCC 

351 ACATCAGGGC ATTGATATAT 

401 ATCAAAATCT TAGAACGCTT 

451 AAGATGGGTA TGCGTACGCT 

501 CGTAAAAATG GCATCAAGCG 

551 AAGTTTATGA CTTGTACGAA 

601 AAGCGGTCAA AATGGTTTTA 

651 CGTTTTTGTC GGCCTGTCCT 

701 AGGAAGAACC CGCAGCACAA 

751 TTTCAGGATA AAACAGAAGG 

801 AGATATGTTT GTTCCGACAT 

851 ATAACGGTGT AAGGCAGGTA 

901 GAAGGCGGAA GAACCGGATG 

951 GAAAGAAATT ACAAAGGAAA 

1001 CGTTTAACCC ATATAAAGAA 

1051 GAGCAGCACC ATTCGGACAG 

1101 GTGGCAAAAT CTTATGTATG 

1151 AAGGAATCGG CGGGGGCGTG 

This encodes a protein having amino acid 



GATGAAGCTC AAGACGTATG GCCGGCACGC 
TGAAAATGTC CAATGGCTGA ATACGCACAG 
TTGTTTTGAC TCAAGGCTCT AAGCTTCTAG 
GTACGGAAAC AT T AC C ACAT CGCTTCAAAC 
TTTAGAATGG AAAATATGCG CGGACGATCC 
CATTCTCCAG TATCTATACA CTGGATAAAA 
TCAGCGGAAG TTCATACCGT AAATAAGGTC 
TACTCTGCCA GTAATAATAT TGCTGATTCC 
ATAAAATGTT AAGTAGTTAT GGAAAAAAAC 
GAATCGGCGG CAACAGAACA TCAGGCAGTA 
CGAGCCGGTA AACAACGGTA ACCTTACCGC 
TGTCCGAAAA ACCCGAAAGC AAGCCGATTT 
AGAACCTTTG AATATATAGC AGGCTGTGTA 
CACATGCTAT TCGCATCAAG GGACGGCATT 
TGTGCAAGGA TTACGCAAGA AACGGATTGC 
GAAAGCCAAG GGCGGGATGT CCAGCAAAGT 
AC CGCAAGT T GCCACGTTGG GCGGAAAGCC 
ATAATTGGCA GGAGCGCGGA AAACCGTTTG 
GTCGGATCGG CAAACTGA 

' sequence <SEQ ID 324>: 



1 MAEICLITGT PGSGKTLKMV SMMANDEMFK PDENGIRRKV FTNIKGLKIP 

51 HTYIETDAKK LPKSTDEQLS AHDMYEWIKK PENIGSIVIV DEAQDVWPAR 

101 SAGSKIPENV QWLNTHRHQG IDIFVLTQGS KLLDQNLRTL VRKHYHIASN 

151 KMGMRTLLEW KICADDPVKM ASSAFSSIYT LDKKVYDLYE SAEVHTVNKV 

201 KRSKW FYTLP VIILLIPVFV GL SYKMLSSY GKKQEEPAAQ ESAATEHQAV 

251 FQDKTEGEPV NNGNLTADMF VPTLSEKPES KPIYNGVRQV RTFEYIAGCV 

301 EGGRTGCTCY SHQGTALKEI TKEMCKDYAR NGLPFNPYKE ESQGRDVQQS 

351 EQHHSDRPQV ATLGGKPWQN LMYDNWQERG KPFEGIGGGV VGSAN* 

ORP84a and ORF84-1 show 95.2% identity in 395 aa overlap: 



MAEICLITGT PGSGKTLKMVSMMANDEMFKPDENGIRRKVFTNIKGLKIPHTYIETDAKK 
! I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I | M M | | | | | I 1 | | | | || | | i 
MAEICLITGTPGSGKTLKMVSMMANDEMFKPDENGIRRKVFTNIKGLKIPHTYIETDAKK 
10 20 30 40 50 60 

70 80 90 100 110 120 

LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 
( I I I I I I I I I I I I 1 I I I I I I I I I I I M I I I I I I I M I I i I I I I I II I I I M I I [ I I M II 
LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 

70 80 90 100 110 120 

130 140 150 160 170 180 

IDIFVLTQGSKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADDPVKMASSAFSSIYT 
I i I M I I I I I I I I I I I I I I I I I I I I I I I I I I I | || | | | | | | || | | | || || | | | | | | | | | 
IDIFVLTQGPKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADDPVKMASSAFSSIYT 

130 140 150 160 170 180 

190 200 210 220 230 240 

LDKKVYDLYE SAEVHTVNKVKRSKWFYTLPVIILLIPVFVGLSYKMLSSYGKKQEEPAAQ 
I N M N I I I I II I I I I I I I I I I I I I I | | | | | : M I II I I I I I I I I I I I I I I I || | | [ | | 
LDKKVYDLYESAEVHTVNKVKRSKWFYTLPVIVLLIPVFVGLSYKMLSSYGKKQEEPAAQ 

190 200 210 220 230 240 

250 260 270 280 290 300 

ESAATEHQAVFQDKTEGE PVNNGNLTADMFVPTLSEKPESKPIYNGVRQVRTFEYIAGCV 
N N I ! : I I ! : I II I! M I I M I II I M i M I I I I I I M II II M M M M I II I I M : 
ESAATEQQAVLPDKTEGEPVNNGNLTADMFVPTLSEKPESKPIYNGVRQVRTFEYIAGCI 

250 260 270 280 290 300 

310 320 330 340 350 360 

EGGRTGCTCYSHQGTALKEITKEMCKDYARNGLPFNPYKEESQGRDVQQSEQHHSDRPQV 
I I I I I I I : I I I i I I I I I I I : I : I I I I I :: I I I I I I I I M I I I I I I I I I : I I I | | | 
EGGRTGCACYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 

310 320 330 340 350 360 



370 380 390 
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orf84a pep ATLGGKPWQNLMYDNWQERGKPFEGIGGGWGSANX 
I I M M ! I I I I I I i I : I I M I I I I I I I I 1 I 1 I M 1 
orf84-l ATLGGKPXQNLMYDNWEERGKPFEGIGGGVVGSANX 
370 380 390 

Homology with a predicted ORF from N. gonorrhoeae 

ORF84 shows 94.2% identity over a 395aa overlap with a predicted ORF (ORF84.ng) from N. 
gonorrhoeae: 

orf84 pep MAEICLITGTPGSGKTLKMVSMMANDEMFKPDEKAIRRKVFTNIKGLKIPHTYIETDAKK 60 

I i I I I I M I I II I I I M I I I I I II I I I I I I I I I ::: M I I I I I I : I I I I I M 

orf84nq maeiclITGTPGSGKTLKMVSMMANDEMFKPDENGVRRKVFTNIKGLKIPHTHIETDAKK 60 



orf 84 .pep 
orf 84ng 
orf 84 -pep 
orf 84ng 
orf 84 -pep 
orf84ng 
orf 84 .pep 
orf 84ng 
orf 84 .pep 
orf84ng 



LPKSTDEQLSAHDMYEWIKKPENIGSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 120 
I I I I I I I M I II I I I I I I I I I 1 I : I : I M II I I I I I I I II I I M I I I I I I I I I I I M I I I 

LPKSTDEQLSAHDMYEWIKKPENVGAIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 12 0 

I D I FVLTQGPKLLDQNLRTLVRKHYHI ASNKMGMRTLLEWKI CADDPVKMAS SAFS S I YT 18 0 

I I I I I I : : M I I I : II I I : I I : I M II I I I I I I I I I I I I I 

I D I FVLTQGPKLLDQNLRT LVKRHYHI AANKMGLRTLLEWKVCADD PVKMAS SAFS S I YT 180 

LDKKVYDLYXXAEVHTVNKVKRSKW FYTL PVIVLL I PVFVGLS YKMLS S YGKKQEE PAAQ 24 0 
| | | | | | M I I I : M I I II M I I I I I : II I I : II M : II I I I I I I I : I I I I I I I M II I 
LDKKVYDLYE SAEIHTVNKVKRSKWFYAL PVIILL I PLFVGLS YKMLGS YGKKQEE PAAQ 24 0 

ESAATEQQAVLPDKTEGEPVNNGNLTADMFVPTLSEKPXSKPIYNGVRQVRTFEYIAGCI 300 
I I I I I I I I I II I I I I I I I I I I II I I I I I I I M I III I 1 I I II II M M I I I I I I I II 
ESAATEQQAVLPDKTEGESVNNGNLTADMFVPTLPEKPESKPIYNGVRQVRTFEYIAGCI 300 

EGGRTGCACYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 360 

I I I I I 1 I : I I I I I I I I I I M I I I I II I II I II II I I I I I I I I I I I I I II I I I I II 

EGGRTGCTCYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 360 

ATLGGKPXQNLMYDNWEERGKPFEGIGGGWGSAN 395 

n 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ii 1 1 1 1 1 1 1 1 1 1 

ATLGGKPQQNLMYDNWEERGKPFEGIGGGWGSAN 395 



The complete length ORF84ng nucleotide sequence <SEQ ID 325> is: 



251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 



ATGGCAGAAA 
AAAAATGGTT 
ACGGCGTACG 
CACACCCACA 
ACAGCTTTCG 
tcggcgCAAT 
TccgCAGGTT 
GCATCAGGGC 
ATCAGAACTT 
AAAATGGGTT 
GGTAAAAATG 
AAGTTTATGA 
AAGCGTTCAA 
GCTATTTGTC 
AGGAAGAACC 
CTTCCGGATA 
AGATATGTTT 
ATAACGGTGT 
GAAGGCGGAA 
GAAAGAAGTG 
CGTTTAACCC 
GCGCAGCAAC 
GCAGCAGAAC 
AAGGAAT CGG 



TCTGTTTGAT 
TCCATGATGG 
CCGTAAAGTA 
TAGAAACAGA 
GCGCATGATA 
CGTTATTGTC 
CGAAAATCCC 
ATAGATATAT 
GCGAACATTG 
TGCGTACCCT 
GCATCAAGTG 
CTTGTACGAA 
AATGGTTTTA 
GGTTTGTCTT 
CGCAGCACAA 
AAACAGAAGG 
GTTCCGACAT 
AAGGCAGGTA 
GAACCGGATG 
ACGGAGTTGA 
ATACAAAGAA 
ATTCGGACAG 
CTAATGTACG 
CGGGGGCGTG 



AACCGGCACG 
CAAACGATGA 
TTTACGAACA 
CGCAAAGAAG 
TGTATGAATG 
GATGAGGCGC 
CGAAAACGTC 
TTGTATTGAC 
GTTAAAAGAC 
GCTTGAATGG 
CATTTTCCAG 
TCCGCAGAAA 
TGCATTGCCC 
ACAAAATGTT 
GAATCGGCGG 
AGAATCGGTG 
TGCCCGAAAA 
AGGACCTTTG 
CACCTGCTAT 
TGTGCAAGGA 
GAAAGCCAAG 
GGCGCAAGTT 
ACAATTGGGA 
GTCGGATCGG 



CCCGGTTCAG 
AATGTTTAAG 
TCAAAGGTTT 
CTGCCGAAAT 
GAT C AAGAAG 
AAGACGTATG 
CAATGGCTGA 
ACAAGGTCCT 
ATTACCACAT 
AAAGTATGCG 
TATCTACACA 
TTCACACGGT 
GT CAT CAT AT 
GGGCAGTTAC 
CAACAGAACA 
AATAACGGAA 
ACCCGAAAGC 
AATATATAGC 
TCGCATCAAG 
CTATGTAAAA 
GGCAGGAAGT 
GCCACCTTGG 
AGAACGCGGG 
CAAACTGA 



GGAAAACATT 
CCAGATGAAA 
GAAGATACCG 
CAACCGATGA 
CCTGAAAacg 
GCCCGCACGC 
ACACACACAG 
AAACTCTTAG 
TGCGGCCAAC 
CGGATGACCC 
CTGGATAAAA 
AAACAAAGTC 
TATTGATTCC 
GGAAAAAAAC 
GCAGGCAGTA 
ACCTTACGGC 
AAGCCGATTT 
AGGCTGTATA 
GGACGGCATT 
AACGGCTTGC 
TCAGCAAAGC 
GCGGAAAACC 
AAACCGTTTG 



This encodes a protein having amino acid sequence <SEQ ID 326>: 

1 MAEICLITGT PGSGKTLKMV SMMANDEMFK PDENGVRRKV FTNIKGLKIP 
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51 HTHIETDAKK LPKSTDEQLS AHDMYEWIKK PENVGAIVIV DEAQDVWPAR 

101 SAGSKIPENV QWLNTHRHQG IDIFVLTQGP KLLDQNLRTL VKRHYHIAAN 

151 KMGLRTLLEW KVCADDPVKM ASSAFSSIYT LDKKVYDLYE S AE I HT VNKV 

201 KRSKW FYALP VIILLIPLFV GL SYKMLGSY GKKQEEPAAQ ESAATEQQAV 

251 LPDKTEGESV NNGNLTADMF VPTLPEKPES KPIYNGVRQV RTFEYIAGCI 

301 EGGRTGCTCY SHQGTALKEV TELMCKDYVK NGLPFNPYKE ESQGQEVQQS 

351 AQQHSDRAQV ATLGGKPQQN LMYDNWEERG KPFEGIGGGV VGSAN* 

ORF84ng and ORF84-1 show 95.4% identity in 395 aa overlap: 

10 20 30 40 50 60 

orf 84-1. pep MAEICLITGTPGSGKTLKMVSMMANDEMFKPDENGIRRKVFTNIKGLKIPHTYIETDAKK 
I I I I I I I I I I I I I I I i I I I I I I I ! I 1 I I I I I I I I I : I I I I I I 1 I I I I I I I I 1 : I I I I I I I 
orf84ng MAEICLITGTPGSGKTLKMVSMMANDEMFKPDENGVRRECVFTNIKGLKIPHTHIETDAKK 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf 84-1. pep LPKSTDEQLSAHDMYEWIKKPEN1GSIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 
I I I I I I I I I I II I I I M I I I I I I : I : I M I I I I I M M I [ I I I I I I I II I I I I I I I [ I II 
orf84ng LPKSTDEQLS AHDMYEWIKKPENVGAIVIVDEAQDVWPARSAGSKIPENVQWLNTHRHQG 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 84-1 . pep IDIFVLTQGPKLLDQNLRTLVRKHYHIASNKMGMRTLLEWKICADDPVKMASSAFSSIYT 
I I I I I I I M I I I I I I I I II I I : : I I I 1 I : I I I I : 1 I I II I I : I I I I I I I I I I I I I I II I I 
orf84ng IDIFVLTQGPKLLDQNLRTLVKRHYHIAANKMGLRTLLEWKVCADDPVKMASSAFSSIYT 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 8 4-1. pep LDKKVYDLYE SAEVHTVNKVKRSKWFYTLPVIVLLIPVFVGLSYKMLSSYGKKQEEPAAQ 
I N I M I I I M M : I I M I I I I I I I M : I I I I : I I I I : I I I I I I I I I : I I I I I I I I I I I I 
orf 84ng LDKKVYDLYE SAEIHTVNKVKRSKWFYALPVI I LLIPLFVGLSYKMLGSYGKKQEEPAAQ 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 8 4-1 .pep ESAATEQQAVLPDKTEGEPVNNGNLTADMFVPTLSEKPESKPIYNGVRQVRT FEYIAGCI 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I | I || | | | | | | | | | ! | | | 
orf 8 4ng ESAATEQQAVLPDKTEGESVNNGNLTADMFVPTLPEKPESKPIYNGVRQVRTFEYIAGCI 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 84-1 . pep EGGRTGCACYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 
M I I I I I : I I I I I I II I I I I I I I I I I I I I I I I! I I I I I I I I I I I I I I I I I II | | | I | | | | 
orf 84ng EGGRTGCTCYSHQGTALKEVTELMCKDYVKNGLPFNPYKEESQGQEVQQSAQQHSDRAQV 

310 320 330 340 350 360 

370 380 390 

orf 8 4-1 . pep ATLGGKPXQNLMYDNWEERGKPFEGIGGGWGSANX 

N I M I 1 I I I I I II I I II I I I I I I II I I I I I I I I I 
orf 8 4ng ATLGGKPQQNLMYDNWEERGKPFEGIGGGWGSANX 
370 380 390 

Based on this analysis, inducing the presence of a putative transmembrane domain (single- 
underlined) in the gonococcal protein, and a putative ATP/GTP-binding site motif A (P-loop, 
double-underlined), it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 39 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 327>: 

1 GTGGTTTTCC TGAATGCCGA CAACGGGATA TTGGTTCAGG ACTTGCCTTT 
51 TGAAGTCAAA CTGAAAAAAT T C CAT AT C G A TTTTTACAAT ACGGGTATGC 
101 CGCGTGATTT CGCCAGCGAT ATTGAAGTGA CGGACAAGGC AACCGGTGAG 
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151 AAACTCGAGC GCACCATCCG CGTGAACCAT CCTTTGACCT TGCACGGCAT 

201 CACGATTTAT CAGGCGAGTT TTGCCGACGG CGGTT CGGAT TTGACATTCA 

251 AGGCGTGGAA TTTGGGTGAT GCTTCGCGCG AGCCTGTCGT GTTGAAGGCA 

301 AC AT C CAT AC ACCAGTTTCC GTTGGAAATT GGCAAACACA AATATCGTCT 

351 TGAGTTCGAT CAGTTCACTT CTATGAATGT GGAGGACATG AGCGAGGGCG 

4 01 CGGAACGGGA AAAAAGCCTG AAATCCACGC TGCCCGATGT CCGCGCCGTT 

451 AC T C AGGAAG GTCACAAATA CACCAAT TACCG 

501 TATCCGTGAT GCGCCAGGCC AGGCGGTCGA ATATAAAAAC TATATGCTGC 

551 CGGTTTTGCA GGAACAGGAT TATTTTTGGA TTACCGGCAC GCGCAGCGC. 

601 TTGCAGCAGC AATACCGCTG GCTGCGTATC CCCTTGGACA AGCAGTTGAA 

651 AGCGGACACC TTTATGGCAT TGCGTGAGTT TTTGAAAGAT GGGGAAGGGC 

7 01 GCAAACGTCT . GTTGCCGAC GCAAC CAAAG GCGCACCTGC CGAAATCCGC 

7 51 GAACAATTCA TGCTGGCTGC GGAAAACACG CTGAACATCT TTGCACAAAA 

801 AGGCTATTTG GGATTGGACG AATTTATTAC GTCCAATATC CCGAAAGAGC 

851 AGCAGGATAA GATGCAGGGC TATTTCTACG AAATGCTTTA CGGCGTGATG 

901 AACGCTGCTT TGGATGAAAC CAT.ACCCGG TACGGCTTGC CCGAATGGCA 

951 GCAGGATGAA GCGCGGAATC GTTTCCTGCT GCACAGTATG GATGCGTACA 

1001 CGGGTTTGAC CGAATATCCC GCGCCTATGC TGCTGCAACT TGATGGGTTT 

1051 TCCGAGGTGC GTTCGTCGGG TTTGCAGATG ACCCGTTCCC C.GGTCCGCT 

1101 TTTGGTCTAT CTC . . . 

This corresponds to the amino acid sequence <SEQ ID 328; ORF88>: 

1 MVFLNADNGI LVQDLPFEVK LKKFHIDFYN TGMPRDFASD IEVTDKATGE 

51 KLERTIRVNH PLTLHGITIY QASFADGGSD LTFKAWNLGD ASREPWLKA 

101 TSIHQFPLEI GKHKYRLEFD QFTSMNVEDM SEGAEREKSL KSTLPDVRAV 

151 TQEGHKYTNX XXXXXYRIRD APGQAVEYKN YMLPVLQEQD YFWITGTRSX 

201 LQQQYRWLRI PLDKQLKADT FMALRE FLKD GEGRKRXVAD ATKGAPAEIR 

251 EQFMLAAENT LNIFAQKGYL GLDEFITSNI PKEQQDKMQG YFYEMLYGVM 

301 NAALDETXTR YGLPEWQQDE ARNRFLLHSM DAYTGLTSYP APMLLQLDGF 

351 SEVRSSGLQM TRSXGPLLVY L... 

Further work revealed the complete nucleotide sequence <SEQ ID 329>: 

1 AT GAGT AAAT CCCGTAGATC TCCCCCACTT CTTTCCCGTC CGTGGTTCGC 

51 TTTTTTCAGC TCCATGCGCT TTGCAGTCGC TTTGCTCAGT CTGCTGGGTA 

101 TTGCATCGGT TATCGGTACG GTGT TGCAGC AAAACCAGCC GCAGACGGAT 

151 TATTTGGTCA AATTCGGATC GTTTTGGGCG CAGATTTTTG GTTTTCTGGG 

201 ACTGTATGAC GTCTATGCTT CGGCATGGTT TGTCGTTATC ATGATGTTTT 

251 TGGTGGTTTC TACCAGTTTG TGCCTGATTC GCAATGTGCC GCCGTTCTGG 

301 CGCGAAATGA AGTCTTTTCG GGAAAAGGTT AAAGAAAAAT CTCTGGCGGC 

351 GATGCGCCAT TCTTCGCTGT T GGAT GT AAA AATTGCGCCC GAGGTTGCCA 

4 01 AACGTTATCT GGAAGTACAA GGTTTTCAGG GAAAAAC CAT TAACCGTGAA 
451 GACGGGTCGG TTCTGATTGC CGCCAAAAAA GGCACAATGA ACAAATGGGG 

5 01 C TAT AT CT T T GCCCATGTTG CTTTGATTGT CATTTGCCTG GGCGGGTTGA 
551 TAGACAGTAA CCTGCTGTTG AAACTGGGTA TGCTGACCGG TCGGATTGTT 
601 CCGGACAATC AGGCGGTTTA TGCCAAGGAT TTCAAGCCCG AAAGTATTTT 
651 GGGTGCGTCC AATCTCTCAT TTAGGGGCAA CGTCAATATT TCCGAGGGGC 
7 01 AGAGTGCGGA TGTGGTTTTC CTGAATGCCG ACAACGGGAT ATTGGTTCAG 
7 51 GACTTGCCTT TTGAAGTCAA ACTGAAAAAA TTCCATATCG ATTTTTACAA 
801 TACGGGTATG CCGCGTGATT TCGCCAGCGA TATTGAAGTG ACGGACAAGG 
851 CAACCGGTGA GAAACT CGAG CGCACCATCC GCGTGAACCA TCCTTTGACC 
901 TTGCACGGCA TCACGATTTA TCAGGCGAGT TTTGCCGACG GCGGTTCGGA 
951 TTTGACATTC AAGGCGTGGA ATTTGGGTGA TGCTTCGCGC GAGCCTGTCG 

1001 TGTTGAAGGC AACAT CCAT A CACCAGTTTC CGTTGGAAAT TGGCAAACAC 

1051 AAATAT CGTC TTGAGTTCGA TCAGTTCACT TCTATGAATG TGGAGGACAT 

1101 GAGCGAGGGC GCGGAACGGG AAAAAAGCCT GAAATCCACG CTGAACGATG 

1151 TCCGCGCCGT TACTCAGGAA GGTAAAAAAT AC AC CAAT AT CGGCCCTTCC 

12 01 ATTGTTTACC GTATCCGTGA TGCGGCAGGG CAGGCGGTCG AATATAAAAA 

12 51 CTATATGCTG CCGGTTTTGC AGGAACAGGA TTATTTTTGG ATTACCGGCA 

13 01 CGCGCAGCGG CTTGCAGCAG CAATACCGCT GGCTGCGTAT CCCCTTGGAC 
1351 AAGCAGTTGA AAGCGGACAC CTTTATGGCA TTGCGTGAGT TTTTGAAAGA 

14 01 TGGGGAAGGG CGCAAACGTC TGGTTGCCGA CGCAACCAAA GGCGCACCTG 
14 51 CCGAAATCCG CGAACAATTC ATGCTGGCTG CGGAAAACAC GCTGAACATC 
1501 TTTGCACAAA AAGGCTATTT GGGATTGGAC GAATTTATTA CGTCCAATAT 
1551 CCCGAAAGAG CAGCAGGATA AGATGCAGGG CTATTTCTAC GAAATGCTTT 
1601 ACGGCGTGAT GAACGCTGCT TTGGATGAAA CCATACGCCG GTACGGCTTG 
1651 CCCGAATGGC AGCAGGATGA AGCGCGGAAT CGTTTCCTGC TGCACAGTAT 
17 01 GGATGCGTAC ACGGGTTTGA CCGAATATCC CGCGCCTATG CTGCTGCAAC 
1751 TTGATGGGTT TTCCGAGGTG CGTTCGTCGG GTTTGCAGAT GACCCGTTCC 
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1801 CCGGGTGCGC TTTTGGTCTA TCTCGGCTCG GTGCTGTTGG TATTGGGTAC 

1851 GGTATTGATG TTTTATGTGC GCGAAAAACG GGCGTGGGTA TTGTTTTCAG 

1901 ACGGCAAAAT CCGTTTTGCC ATGTCTTCGG CCCGCAGCGA ACGGGATTTG 

1951 CAGAAGGAAT TTCCAAAACA CGTCGAGAGT CTGCAACGGC TCGGCAAGGA 

2001 CTTGAATCAT GACTGA 

This corresponds to the amino acid sequence <SEQ ID 330; ORF88-l>: 

1 MSKSRRSPPL LSRPWFAFFS SMRF AVALLS LLGIASVIGT VL QQNQPQTD 

51 YLVKFGSFWA QIFGFLGLYD VYASAW FWI MMFLWSTSL CLI RNVPPFW 

101 REMKSFREKV KEKSLAAMRH SSLLDVKIAP EVAKRYLEVQ GFQGKTINRE 

151 DGSVLIAAKK GTMNKWG YIF AHVALIVICL GGLI DSNLLL KLGMLTGRIV 

201 PDNQAVYAKD FKPESILGAS NLSFRGNVNI SEGQSADWF LNADNGILVQ 

251 DLPFEVKLKK FHIDFYNTGM PRDFASDIEV TDKATGEKLE RTIRVNHPLT 

301 LHGITIYQAS FADGGSDLTF KAWNLGDASR EPWLKATSI HQFPLEIGKH 

351 KYRLEFDQFT SMNVE DMSEG AEREKSLKST LNDVRAVTQE GKKYTNIGPS 

401 IVYRIRDAAG QAVEYKNYML PVLQEQDYFW ITGTRSGLQQ QYRWLRIPLD 

451 KQLKADTFMA LREFLKDGEG RKRLVADATK GAPAEIREQF MLAAENTLNI 

501 FAQKGYLGLD EFITSNIPKE QQDKMQGYFY EMLYGVMNAA LDETIRRYGL 

551 PEWQQDEARN RFLLHSMDAY TGLTEYPAPM LLQLDGFSEV RSSGLQMTRS 

601 PG ALLVYLGS VLLVLGTVLM FYVREKRAWV LFSDGKIRFA MSSARSERDL 

651 QKEFPKHVES LQRLGKDLNH D* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF88 shows 95.7% identity over a 371aa overlap with an ORF (ORF88a) from strain A of N. 



10 20 30 

MVFLNADNGI LVQDLP FEVKLKKFHI DFYN 

: I 1 I ! I I I I I I I I I I I I I I I Ill 

AKDFKPESILGASNLSFRGNVNISEGQSADWFLNADNGILVQDLPFEVKLKKFHIDFYN 
210 220 230 240 250 260 

40 50 60 70 80 90 

TGMPRDFASDIEVTDKATGEKLERTIRVNHPLTLHGITIYQASFADGGSDLTFKAWNLGD 

TGMPRDFASDIEVTD^TC 

270 280 290 300 310 320 

100 110 120 130 140 150 

ASREPWLKATSIHQFPLEIGKHKYRLEFDQFTSMNVEDMSEGAEREKSLKSTLPDVRAV 
I I i I i I I I I I I I I II I I I I I I I I I I M I I I I I I II I I I I I 1 I 1 I I II M M I I I I I I I I 
ASREPVVLKATSIHQFPLEIGKHKYRLEFDQFTSMNVEDMSEGAEREKSLKSTLNDVRAV 
330 340 350 360 370 380 

160 170 180 190 200 210 

TQEGHKYTNXXXXXXYRIRDAPGQAVEYKNYMLPVLQEQDYFWITGTRSXLQQQYRWLRI 
1111:1111 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I 1 I 

TQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYMLPVLQEQDYFWITGTRSGLQQQYRWLRI 
390 400 410 420 430 440 

220 230 240 250 260 270 

PLDKQLKADTFMALREFLKDGEGRKRXVADATKGAPAEIREQFMLAAENTLNIFAQKGYL 
M I I 1 II I I II I i I I I I I M M I I I I I I I I M i I I I I I I I I I M II I I I 1 I I I I I I I I I 
P LDKQLKADT FMALRE FLKDGE GRKRLVADATKGAP AE I RE QFMLAAENT LNI FAQKGYL 
450 460 470 480 490 500 

280 290 300 310 320 330 

GLDEFITSNIPKEQQDKMQGYFYEMLYGVMNAALDETXTRYGLPEWQQDEARNRFLLHSM 
I I I I I I I I I I I I I I i I [ I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
GLDEFITSNIPKEQQDKMQGYFYEMLYGVMNAALDETIRRYGLPEWQQDEARNRFLLHSM 
510 520 530 540 550 560 



meningitidis: 

orf88 .pep 
orf88a 

orf88 .pep 
orf88a 

orf 88 .pep 
orf88a 

orf88 .pep 
orf88a 

orf88 .pep 
orf88a 

orf 88 .pep 

orf88a 



orf 88 .pep 



340 350 360 370 

DAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRSXGP LLVYL 
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M 1 I t I I I I I I I I I I I I M > I > N I 1 I I I I I I I I Mill 
orf88a D 7s Y T^T.TBVP&PMT.T.r)T,nF;F.qKVRSSGLOMTRSPG ALLVYLGSVLLVLGTVLM FYVREKR 
570 580 590 600 610 620 

or f 8 8a AWVLFSDGKIRFAMSSARSERDLQKEFPKHVESLQRLGKDLNHDX 
630 640 650 660 670 

The complete length ORF88a nucleotide sequence <SEQ ID 33 1> is: 

1 ATGAGTAAAT CCCGTAGATC TCCCCCACTT CTTTCCCGTC CGTGGTTCGC 

51 TTTTTTCAGC TCCATGCGCT TTGCGGTCGC TTTGCTCAGT CTGCTGGGTA 

101 TTGCATCGGT TATCGGTACG GTGTTGCAGC AAAACCAGCC GCAGACGGAT 

151 TATTTGGTCA AATTCGGATC GTTTTGGGCG CAGATTTTTG GTTTTCTGGG 

2 01 ACT GT AT GAC GTCTATGCTT CGGCATGGTT TGTCGTTATC ATGATGTTTT 

251 TGGTGGTTTC TACCAGTTTG TGCCTGATTC GCAATGTGCC GCCGTTCTGG 

301 CGCGAAATGA AGTCTTTTCG GGAAAAGGTT AAAGAAAAAT CTCTGGCGGC 

351 GATGCGCCAT TCTTCGCTGT TGGATGTAAA AATTGCGCCC GAGGTTGCCA 

401 AACGTTATCT GGAAGTACAA GGTTTTCAGG GAAAAACCAT TAACCGTGAA 

451 GACGGGTCGG TTCTGATTGC CGCCAAAAAA GGCACAATGA ACAAATGGGG 

501 CTATATCTTT GCCCATGTTG CTTTGATTGT CATTTGCCTG GGCGGGTTGA 

551 TAGACAGTAA CCTGCTGTTG AAACTGGGTA TGCTGACCGG TCGGATTGTT 

601 CCGGACAATC AGGCGGTTTA TGCCAAGGAT TTCAAGCCCG AAAGTATTTT 

651 GGGTGCGTCC AATCTCTCAT TTAGGGGCAA CGTCAATATT TCCGAGGGGC 

701 AGAGTGCGGA TGTGGTTTTC CTGAATGCCG ACAACGGGAT ATTGGTTCAG 

751 GACTTGCCTT TTGAAGTCAA ACTGAAAAAA TTCCATATCG ATTTTTACAA 

801 TACGGGTATG CCGCGCGATT TTGCCAGTGA TATTGAAGTA ACGGATAAGG 

851 CAACCGGTGA GAAACT CGAG CGCACCATCC GCGTGAACCA TCCTTTGACC 

901 TTGCACGGCA TCACGATTTA TCAGGCGAGT TTTGCCGACG GCGGTTCGGA 

951 TTTGACATTC AAGGCGTGGA ATTTGGGTGA TGCTTCGCGC GAGCCTGTCG 

1001 TGTTGAAGGC AACAT CCATA CACCAGTTTC CGTTGGAAAT TGGCAAACAC 

1051 AAATATCGTC TTGAGTTCGA TCAGTTTACT TCTATGAATG TGGAGGACAT 

1101 GAGCGAGGGC GCGGAACGGG AAAAAAGCCT GAAATCCACG CTGAACGATG 

1151 TCCGCGCCGT TACTCAGGAA GGTAAAAAAT ACACCAATAT CGGCCCTTCC 

1201 ATTGTTTACC GTATCCGTGA TGCGGCAGGG CAGGCGGTCG AATATAAAAA 

1251 CTATATGCTG CCGGTTTTGC AGGAACAGGA TTATTTTTGG ATTACCGGCA 

1301 CGCGCAGCGG CTTGCAGCAG CAATACCGCT GGCTGCGTAT CCCCTTGGAC 

1351 AAGCAGTTGA AAGCGGACAC CTTTATGGCA TTGCGTGAGT TTTTGAAAGA 

1401 TGGGGAAGGG CGCAAACGTC TGGTTGCCGA CGCAACCAAA GGCGCACCTG 

1451 CCGAAATCCG CGAACAATTC ATGCTGGCTG CGGAAAACAC GCTGAACATC 

1501 TTTGCACAAA AAGGCTATTT GGGATTGGAC GAATTTATTA CGTCCAATAT 

1551 CCCGAAAGAG CAGCAGGATA AGATGCAGGG CTATTTCTAC GAAATGCTTT 

1601 ACGGCGTGAT GAACGCTGCT TTGGATGAAA CCATACGCCG GTACGGCTTG 

1651 CCCGAATGGC AGCAGGATGA AGCGCGGAAT CGTTTCCTGC TGCACAGTAT 

17 01 GGATGCGTAC ACGGGTTTGA CCGAATATCC CGCGCCTATG CTGCTGCAAC 

1751 TTGATGGGTT TTCCGAGGTG CGTTCGTCGG GTTTGCAGAT GACCCGTTCC 

1801 CCGGGTGCGC TTTTGGTCTA TCTCGGCTCG GTGCTGTTGG T ATT GGGT AC 

1851 GGTATTGATG TTTTATGTGC GCGAAAAACG GGCGTGGGTA TTGTTTTCAG 

1901 ACGGCAAAAT CCGTTTTGCC ATGTCTTCGG CCCGCAGCGA ACGGGATTTG 

1951 CAGAAGGAAT TTCCAAAACA CGTCGAGAGT CTGCAACGGC TCGGCAAGGA 

2001 CTTGAATCAT GACTGA 

This encodes a protein having amino acid sequence <SEQ ID 332>: 

1 MSKSRRSPPL LSRPWFAFFS SMRF AVALL S LLGIASVIGT VL QQNQPQTD 

51 YLVKFGSFWA QIFGFLGLYD VYASAW FWI MMFLVVSTSL CLI RNVPPFW 

101 REMKSFREKV KEKS LAAMRH SSLLDVKIAP EVAKRYLEVQ GFQGKTINRE 

151 DGSVLIAAKK GTMNKWG YIF AHVALIVICL GGLI DSNLLL KLGMLTGRIV 

2 01 PDNQAVYAKD FKPESILGAS NLSFRGNVNI SEGQSADWF LNADNGILVQ 

251 DLPFEVKLKK FHIDFYNTGM PRDFASDIEV TDKATGEKLE RTIRVNHPLT 

301 LHGITIYQAS FADGGSDLTF KAWNLGDASR EPWLKATSI HQFPLEIGKH 

351 KYRLEFDQFT SMNVEDMSEG AEREKSLKST LNDVRAVTQE GKKYTNIGPS 

4 01 I VYR I RDAAG QAVEYKNYML PVLQEQDYFW ITGTRSGLQQ QYRWLRIPLD 

4 51 KQLKADTFMA LREFLKDGEG RKRLVADATK GAPAEIREQF MLAAENTLNI 

501 FAQKGYLGLD EFITSNIPKE QQDKMQGYFY EMLYGVMNAA LDETIRRYGL 

551 PEWQQDEARN RFLLHSMDAY TGLTEYPAPM LLQLDGFSEV RSSGLQMTRS 

601 PG ALLVYLGS VLLVLGTVLM FYVREKRAWV LFSDGKIRFA MSSARSERDL 

651 QKEFPKHVES LQRLGKDLNH D* 



ORF88a and ORF88-1 100.0% identity in 671 aa overlap: 
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orf8 8a pep MSKSRRSPPLLSRPWFAFFSSMRFAVALLSLLGIASVIGTVLQQNQPQTDYLVKFGSFWA 60 

I M I I I I I I I M I I I I I I I I III I I I I 1 I I I I 11 M I I I I I i I M I II I I I I I I I I I I I I 

orf88-l MSKSRRSPPLLSRPWFAFFSSMRFAVALLSLLGIASVIGTVLQQNQPQTDYLVKFGSFWA 60 

orf88a Dep QIFGFLGLYDVYASAWFVVIMMFLWSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRH 12 0 

M | [ | [ | | | | | | | | | I I 1 I I I II I I M I I I I I I M I I 1 I I I I 

orf 88-1 QiFGFLGLYDVYASAWFVVIMMFLWSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRH 120 

orf88a pep 3SLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWGYIFAHVALIVICL 180 

1 I 1 I I I I I I I I I I I I I I I I I M I I I I I I I I II I I I I I I I I I I I I I I I I I II I I 

orf88-l SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWGYIFAHVALIVICL 180 

orf88a pep GGLIDSNLLLKLGMLTGRIVPDNQAVYAKDFKPESILGASNLSFRGNWISEGQSADVVF 240 

II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 

orf 88-1 GGLIDSNLLLKLGMLTGRIVPDNQAVYAKDFKPESILGASNLSFRGNVNISEGQSADVVF 2 40 

orf88a pep LNADNGILVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNHPLT 300 

M I I I I I I I I I I I I I I I I I I M I I I II I I I I I I M I I I I I I I I I I I II I I I II I I I I I I I 

orf 88-1 LNADNGILVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNHPLT 30 0 

orf 88a pep LHGITIYQASFADGGSDLTFKAWNLGDASREPWLKATSIHQFPLEIGKHKYRLEFDQFT 3 60 

I | I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 88-1 LHGITIYQASFADGGSDLTFKAWNLGDASREPVVLKATSIHQFPLEIGKHKYRLEFDQFT 3 60 

orf88a pep SMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYML 420 

I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II 

orf 88-1 SMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYML 420 

orf 88a. pep PVLQEQDYFWITGTRSGLQQQYRWLRIPLDKQLKADTFMALREFLKDGEGRKRLVADATK 4 80 

II I I I I I I I I I I II I I I I I I I I I I I M I I I I I I II I I I I I I I I I I I I I I I 1 I I I I I I I I I 
orf 8 8-1 PVLQEQDYFW ITGTRS GLQQQYRWLRI PLDKQLKADT FMALRE FLKDGEGRKRLVADATK 4 80 

orf 88a . pep GAPAEIREQFMLAAENTLNIFAQKGYLGLDEFITSNIPKEQQDKMQGYFYEMLYGVMNAA 540 

I I I I I I I I I I I I I I I I I I I II I I I II I 1 I I 1 I I I II I I I I I I I I I I I I I I I I I I I I I I M 
orf 88-1 GAPAEIREQFMLAAENTLNIFAQKGYLGLDEFITSNIPKEQQDKMQGYFYEMLYGVMNAA 54 0 

orf 88a. pep LDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRS 600 

! I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I ! M I I I I I I 

orf 8 8-1 LDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRS 600 

or f 8 8a . pep PGALLVYLGSVLLVLGTVLMFYVREKRAWVLFSDGKIRFAMSSARSERDLQKEFPKHVES 6 60 

orf 8 8-1 PGALLVYLGSVLLVLGTVLMFYVREKRAW 660 

orf 88a. pep LQRLGKDLNHD 672 

I I I I I I I I I I I 
orf88-l LQRLGKDLNHD 672 

Homology with a predicted ORF from N .gonorrhoeae 

ORF88 shows 93.8% identity over a 371 aa overlap with a predicted ORF (ORF88.ng) from N. 



orf 88 .pep MVFLNADNGILVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNH 60 

I I I I I I I I I : I I I I I I I M I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I M II I I I I I 

orf 8 8ng MVFLNADNGMLVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNH 60 

orf88 .pep PLTLHGIT I YQAS FADGGS DLT FKAWNLGDASRE PWLKAT S IHQFPLE IGKHKYRLE FD 120 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II 
orf8 8ng PLTLHGITI YQAS FADGGSDLTFKAWNLRDAS RE PWLKAT S IHQFPLE IGKHKYRLE FD 120 

orf88 .pep QFTSMNVEDMSEGAEREKSLKSTLPDVRAVTQEGHKYTNXXXXXXYRIRDAPGQAVEYKN 180 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I 

orf8 8ng QFTSMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKN 180 

orf 88 .pep YMLPVLQEQDYFWITGTRSXLQQQYRWLRIPLDKQLKADTFMALREFLKDGEGRKRXVAD 240 

I I I I : I I : : I I I I : ! I I I I ! I I II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I III 

orf88ng YMLPILQDKDYFWLTGTRSGLQQQYRWLRI PLDKQLKADT FMALREFLKDGEGRKRLVAD 240 
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orf88 DeT3 ATKGAPAEIREQFMLAAENTLNIFAQKGYLGLDEFITSNIPKEQQDKMQGYFYEMLYGVM 
|M I I [ I I I I I I I I I I I I I I 1 I I I I I I I I I 1 1 M I I! I I I I I I I I I I I I I I III I I I I 
orf8 8ng ATK DAPAEIREQFMLAAENTLNIFAQKGYLGLDEFITSNIPKGQQDKMQGYFYEMLYGVM 

orf88 net) NAALDETXTRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQM 

P P mi III I Mil I II I I II II II Mil III I II II II II II II II II Mil 

orf88ng NA ALDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQM 
or f 8 8. pep TRSXGPLLVYL 

orf88ng trLgaLLVYLGSVLLVLGTVFMFYVPKKRAWVLFSNXKIRFAMSSARSERDLQKEFPKH 

An ORF88ng nucleotide sequence <SEQ ID 333> was predicted to encode a protein havin; 
acid sequence <SEQ ID 334>: 



300 
300 
360 
360 
371 
420 

; amino 



1 M VFLNADNGM LVQDLPFEVK LKKFHIDFYN 

51 KLERTIRVNH PLTLHGITIY QASFADGGSD 

101 TSIHQFPLEI gkhkyrlefd qftsmnvedm 

151 TQEGKKYTNI GPSIVYRIRD AAGQAVEYKN 

201 LQQQYRWLRI PLDKQLKADT FMALREFLKD 

251 EQFMLAAENT LNIFAQKGYL GLDEFITSNI 

301 NAALDETIRR YGLPEWQQDE ARNRFLLHSM 

351 SEVRSSGLQM TRSPG ALLVY LGSVLLVLGT 

401 RFAMSSARSE RDLQKEFPKH VESLQRLGKD 



TGMPRDFASD IEVTDKATGE 
LTFKAWNLRD ASREPWLKA 
SEGAEREKSL KSTLNDVRAV 
YMLPILQDKD YFWLTGTRSG 
GEGRKRLVAD ATKDAPAEIR 
PKGQQDKMQG YFYEMLYGVM 
DAYTGLTEYP APMLLQLDGF 
VFM FYVPKKR AWVLFSNXKI 
LNHD* 



Further work revealed the complete gonococcal DNA sequence <SEQ ID 335>: 



101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 



AT GAG T AAAT 
TTTTTTCAGC 
TTGCATCGGT 
TATTTGGTCA 
TTTGTATGAT 
TGGTGGTTTC 
CGCGAAATGA 
GATGCGCCAT 
AACGTTATCT 
GACGGGTCGG 
CTATATCTTT 
TAGACAGTAA 
CCGGACAATC 
GGGTGCGTCC 
AAAGTGCGGA 
GACTTGCCTT 
TACGGGTATG 
CAACCGGTGA 
TTGCACGGCA 
TTTGACATTC 
TGTTGAAGGC 
AAATATCGTC 
GAGCGAGGGT 
TCCGCGCCGT 
ATCGTGTACC 
CTATATGCTG 
CGCGCAGCGG 
AAGCAGTTGA 
TGGGGAAGGG 
CCGAAATCCG 
TTTGCGCAAA 
CCCGAAAGGG 
ACGGCGTGAT 
CCCGAATGGC 
GGATGCCTAT 
TTGACGGGTT 
CCGGGTGCGC 
ggtaTttatg 
aCGGCAAAAT 
cAGAaggaaT 



CCCGTATATC 
TCCATGCGCT 
TATCGGCACG 
AATTCGGACC 
GTCTATGCTT 
TACCAGTTTG 
AGTCTTTCCG 
TCTTCGCTGT 
GGAGGTGCGG 
TTCTGATTGC 
GCccaagtag 
CCTGCTGCTG 
AGGCGGTTTA 
AATCTCTCAT 
TGTGGTTTTC 
TTGAAGTCAA 
CCGCGCGATT 
GAAACTCGAG 
TCACGATTTA 
AAGGCGTGGA 
AACCTCCATA 
TTGAGTTCGA 
GCGGAACGGG 
TACTCAGGAA 
GCATCCGTGA 
CCGATTTTGC 
CTTGCAGCAG 
AAGCGGACAC 
CGCAAACGTC 
CGAACAATTC 
AAGGCTATTT 
CAGCAGGATA 
GAACGCTGCT 
AGCAGGATGA 
ACGGGGCTGA 
TTCCGAGGTG 
TTTTGGTCTA 
tTTTATGTGC 
CCGTTTTGCT 
TTCCAAAACA 



TCCCACACTT 
TTGCGGTCGC 
GTGTTACAGC 
GTTTTGGACT 
CGGCATGGTT 
TGTTTAATCC 
GGAAAAGGTT 
TGGATGTAAA 
GGTTTTCAGG 
CGCCAAAAAA 
ctTTGATTGT 
AAGCTGGGTA 
TGCCAAGGAT 
TTAGGGGCAA 
CTGAATGCCG 
ACTGAAAAAA 
TTGCCAGCGA 
CGCACCATCC 
TCAGGCGAGT 
ATTTGAGGGA 
CACCAGTTTC 
TCAGTTCACT 
AAAAAAGCCT 
GGTAAAAAAT 
TGcggCAGGG 
AGGACAAAGA 
CAATACCGCT 
CTTTATGGCA 
TGGTTGCCGA 
ATGCTGGCTG 
GGGATTGGAC 
AGATGCAGGG 
TTGGATGAAA 
AGCGCGGAAC 
CGGAATATCC 
CGTTCCTCAG 
TCtcggctcg 
GCGAAAAACG 
ATGtCTTcgg 
CGtcgAGAGC 



CTTTCCCGTC 
TTTGCTCAGT 
AAAACCAGCC 
CGGATTTTTG 
TGTCGTTATC 
GTAACGTTCC 
AAAGAAAAAT 
AATTGCCCCC 
GAAAAAC CGT 
GGCAcaatga 
CATTTGCCTG 
TGCTGGCCGG 
TTCAAGCCCG 
CGTCAATATT 
ACAACGGGAT 
TTCCATATCG 
TAT T GAAGT A 
GCGTGAACCA 
TTTGCCGACG 
TGCTTCGCGC 
CGTTGGAAAT 
TCTATGAATG 
GAAATCCACT 
ACACCAATAT 
CAGGCGGTCG 
TTATTTTTGG 
GGCTGCGTAT 
TTGCGTGAGT 
CGCAACCAAA 
CGGAAAACAC 
GAATTTATTA 
CTATTTCTAC 
CCATACGCCG 
CGTTTCCTGC 
CGCGCCTATG 
GTTTGCAGAT 
gtattgttgg 
GGCGTGGgta 
CCcgcagcga 
CTGCAACggc 



CGTGGTTCGC 
CTGCTGGGTA 
GCAGACGGAT 
ATTTTTTGGG 
ATGATGTTTC 
GCCGTTTTGG 
CTCTGGCGGC 
GAAGTTGCCA 
CAGCCGTGAG 
acaaATGGGG 
GGCGGGTTGA 
TCGGATTGTT 
AAAGTATTTT 
TCCGAGGGGC 
GTTGGTTCAG 
ATTTTTACAA 
ACGGACAAGG 
TCCTTTGACC 
GCGGTTCGGA 
GAACCTGTCG 
CGGCAAACAC 
TGGAGGACAT 
CTGAACGATG 
CGGCCCTTCC 
AATATAAAAA 
CTGACCGGCA 
CCCCTTGGAC 
TTTTGAAAGA 
GACGCACCTG 
GCTGAATATC 
CGTCCAATAT 
GAAATGCTTT 
GTACGGCTTG 
TGCACAGTAT 
CTGCTCCAGC 
GACCCGTTCG 
TTTTGGgtac 
tTGTTTTCag 
ACGGGATTTG 
tcggcaaggA 
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101 

151 

201 
251 



2001 CttgaaTCAT GACTga 

This corresponds to the amino acid sequence <SEQ ID 336; ORF88ng-l>: 

MSKSRISPTL LSRPWFAFFS SMRF AVALLS LLGIASVIGT VL QQNQPQTD 
YLVKFGPFWT RIFDFLGLYD VYASAW FWI MMFLWSTSL CLI RNVPPFW 
REMKSFREKV KEKSLAAMRH SSLLDVKIAP EVAKRYLEVR GFQGKTVSRE 
DGSVLIAAKK GTMNKWG YIF AQVALIVICL GGLI DSNLLL KLGMLAGRIV 
PDNQAVYAKD FKPESILGAS NLSFRGNVNI SEGQSADVVF iNADNGMLVQ 
to± DLPFEVKLKK FHIDFYNTGM PRDFASDIEV TDKATGEKLE RTIRVNHPLT 
301 LHGITIYQAS FADGGSDLTF KAWNLRDASR EPVVLKATSI HQFPLEIGKH 
351 KYRLEFDQFT SMNVEDMSEG AEREKSLKST LNDVRAVTQE GKKYTNIGPS 
4 01 IVYRIRDAAG QAVEYKNYML PILQDKDYFW LTGTRSGLQQ QYRWLRIPLD 
451 KQLKADTFMA LRE FLKDGEG RKRLVADATK DAPAEIREQF MLAAENTLNI 
501 FAQKGYLGLD EFITSNIPKG QQDKMQGYFY EMLYGVMNAA LDETIRRYGL 
551 PEWQQDEARN RFLLHSMDAY TGLTEYPAPM LLQLDGFSEV RSSGLQMTRS 
601 PGA LLVYLGS VLLVLGTVFM FYVREKRAWV LFSDGKIRFA MSSARSERDL 
651 QKEFPKHVES LQRLGKDLNH D* 

ORF88ng-l and ORF88-1 show 97.0% identity in 671 aa overlap: 

orf 88-1 -pep MSKSRR3PPLLSRPWFAFFSSMRFAVALLSLLGIASVIGTVLQQNQPQTDYLVKFGSFWA 60 

Mill II I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I M I I M: 
orf88ng-l MSKSRISPTLLSRPWFAFFSSMRFAVALLSLLGIASVIGTVLQQNQPQTD YLVKFGPFWT 60 

orf88-l pep QIFGFLGLYDVYASAWFWIMMFLWSTSLCLIRNVPPFWREMKSFREKV KEKSLAAMRH 120 
: I I | | | I I M I I I I I I I I II I I I I I I I I I I I I I 1 I I I I I I I I M I I I I 1 I I I I I I I I I I 

orf 88ng-l rifdflglydvyasawfvvimmflvvstslclirnvppfwremksfrekvkekslaamrh 120 

orf 88-1 .pep SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWGYIFAHVALIVICL 180 

i I II I I I I I I I I I I I I I I I : M I I I I :: I I I I M I I I I I I I I I I I I ! M I I : I I I I I I I I 
orf88ng-l S S LLDVKI APEVAKRYLEVRGFQGKTVSREDGSVL IAAKKGTMNKWGYI FAQVALI VI CL 180 

orf 88-1 -pep GGLIDSNLLLKLGMLTGRIVPDNQAVYAKDFKPESILGASNLSFRGNVNI SEGQSADVVF 240 

I I I I [ I I I I I I I I II : I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I M 
orf8 8ng-l GGLIDSNLLLKLGMLAGRIVPDNQAVYAKDFKPESILGASNLSFRGNVNI SEGQSADVVF 2 40 

orf 88-1 .pep LNADNGILVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNHPLT 300 

I I I I I I : I II I I I I I I I I I I I I I I I I I I I I I I I I I I II M I I! I I I I I 

orf88ng-l LNADNGMLVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVT DKATGEKLERTIRVNHPLT 300 

orf 88-1. pep LHGITIYQAS FADGGSDLTFKAWNLGDASREPWLKATSIHQFPLEIGKHKYRLEFDQFT 360 

I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I M I! I I I I I I I I I I I I I 
orf88ng-l LHGITIYQAS FADGGSDLTFKAWNLRDASREPWLKATSIHQFPLEIGKHKYRLEFDQFT 360 

orf 88-1 .pep SMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYML 420 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 
orf8 8ng-l SMNVEDMSEGAEREKSLKSTLNDVRAVTQEGKKYTNIGPSIVYRIRDAAGQAVEYKNYML 420 

orf 88-1 .pep PVLQEQDYFWITGTRSGLQQQYRWLRIPLDKQLKADTFMALREFLKDGEGRKRLVADATK 480 

orf 88ng-l PILQDKDYFWLTGTRSGLQQQYRWLRIPLDKQLKADTFMALREFLKDGEGRKRLVADATK 4 80 

orf 88-1 . pep GAPAEIREQFMLAAENTLNIFAQKGYLGLDEFITSNIPKEQQDKMQGYFYEMLYGVMNAA 540 

I II II I II M I I I I I I I I I I I II I I M M I I I I I I I I I I I I I I I I I I I II I I I I I I I I 
orf 88ng-l DAPAEIREQFMLAAENTLNIFAQKGYLGLDEFITSNIPKGQQDKMQGYFYEMLYGVMNAA 54 0 

orf 8 8-1 .pep LDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRS 600 

I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I 1 1 1 1 M II 

orf8 8ng-l LDETIRRYGLPEWQQDEARNRFLLHSMDAYTGLTEYPAPMLLQLDGFSEVRSSGLQMTRS 600 

orf 8 8-1 . pep PGALLVYLGSVLLVLGTVLMFYVREKRAWVLFSDGKIRFAMSSARSERDLQKEFPKHVES 660 

I I I I I I I I I M I I I I I I I : I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I 
orf8 8ng-l PGALLVYLGSVLLVLGTVFMFYVREKRAWVLFSDGKIRFAMSSARSERDLQKEFPKHVES 660 

orf 8 8-1. pep LQRLGKDLNHD 671 

I I I I I I I I I I I 
orf8 8ng-l LQRLGKDLNHD 671 
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Furthermore, ORG88ng-l shows homology with a hypothetical protein from Aquifex aeolicus: 

gi | 2984296 (AE000771) hypothetical protein [Aquifex aeolicus] Length = 537 
Score = 94.4 bits (231), Expect = 2e-18 

Identities = 91/334 (27%), Positives = 159/334 (47%), Gaps = 59/334 (17%) 

FAFFS SMRFAVALLSLLGIASVIG-TVLQQNQPQTDYLVKFGPFWTRIFDFLGLYDVYAS 74 
+ F +s++ A+ ++ +LGI S++G T ++QNQ YL +FG L L DV+ S 

YDFLASLKLAIFIMLVLGILSMLGSTYIKQNQSFEWYLDQFGYDVGIWIWKLWLNDVFHS 139 

AWFVVIMMFLWSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRHSSLLDVKIAPEVAK 13 4 

++++ ++ L V+ C 1+ +P W++ S +E++ + A +H + VKI P+ K 
WYYILFIVLLAVNLIFCSIKRLPRVWKQAFS-KERILKLDEHAEKHLKPITVKI-PDKDK 197 

— RYLEVRGFQGKTVSREDGSVLIAAKKGTMNKWGYIFAQVALIVICLGGLIDSNLLLKL 192 

++L +GF+ V E + + A+KG ++ G +AL+VI G LID 
VLKFLLKKGFK-VFVEEEGNKLYVFAEKGRFSRLGVYITHIALLVIMAGALID 24 9 

GMLAGRIVPDNQAVYAKDFKPESILGA3NLSFRGNVNISEGQSADWFLNADNGMLVQDL 252 

+I+G RG++ ++EG + DV+ + A+ L 
AIVGV RGSLIVAEGDTNDVMLVGAE— QKPYKL 280 





16 




80 


Query. 




Sb j ct : 




Query: 


135 


Sbjct: 


198 


Query: 


193 


Sbjct: 


250 


Query: 


253 


Sbjct: 


281 


Query: 


301 


Sbjct: 


338 



P FE VKLKKFH I DFY NTGMPRDFA SDIEVTDKATGEKLER--TIRVNHPLT 300 

PFVLFIY N++FA SDIE+ + G K+E T++VN P 
PFAVHLIDFRIKTYAEENPNVDKRFAQAVSSYESDIEIIN GGKVEAKGTVKVNEPFD 337 

LHGITIYQASFA — DGGSDLT FKAWNLRDASRE P 332 

++QA++ DG S + + + A +P 

FGRYRL FQATYG I L DGTSGMGV I WDRKKAHE DP 371 

Based on this analysis, including the putative transmembrane domain in the gonococcal protein, 
it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 40 

The following DNA sequence, believed to be complete, was identified in N. meningitidis <SEQ ID 
337>: 



1 ATGATGAGTA ATAmAATGGm ACAAAAAGGG TTTACATTGA TTGmGmTGAT 

51 GATAGTCGTC GCGATACTCG GCATTATCAG CGTCATTGCC ATACCTTCTT 

101 ATCmAAGTTA TATTGAAAAA GGCTATCAGT CCCAGCTTTA TACGGAGATG 

151 GyCGGTATCA ACAATATTTC CAAACAGTTT ATTTTGAAAA ATCCCCTGGA 

2 01 CGATAATCAG ACCATCGAGA ACAAACTGGA AATATTTGTC TCAGGCTATA 

2 51 AGATGAATCC GAAAATTGCC AAAAAaTATA GTGTTTCGGT AAAGTTTGTC 

301 GATAAGGAAA AATCAAGGGC ATACAGGTTG GTCGGCGTTC CGAAGGCGGG 

351 GACGGGTTAT ACTTTGTCGG TAT GGATGAA CAGCGTGGGC GACGGATACA 

4 01 AATGCCGTGA TGCCGCTTCT GCCCAAGCCC ATTTGGAGAC CTTGTCCTCA 

4 51 GATGTCGGCT GTGAAGCCTT CTCTAATCGT AAAAAATAA 

This corresponds to the amino acid sequence <SEQ ED 338; ORF89>: 



1 MM SNXMXQKG FTLIXXMIW AILGIISVIA IPSYXSYIEK GYQSQLYTEM 

51 XGINNISKQF ILKNPLDDNQ TIENKLEIFV SGYKMNPKIA KKYSVSVKFV 

101 DKEK SRAYRL VGVPKAGTGY TLSVWMNSVG DGYKCRDAAS AQAHLETLSS 

151 DVGCEAFSNR KK* 

Further work revealed the complete nucleotide sequence <SEQ ID 33 9>: 



1 ATGATGAGTA ATAAAATGGA ACAAAAAGGG TTTACATTGA TTGAGATGAT 

51 GATAGTCGTC GCGATACTCG GCATTATCAG CGTCATTGCC ATACCTTCTT 

101 AT CAAAGT T A TATTGAAAAA GGCTATCAGT CCCAGCTTTA TACGGAGATG 

151 GTCGGTATCA ACAATATTTC CAAACAGTTT ATTTTGAAAA ATCCCCTGGA 
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201 CGATAATCAG ACCATCGAGA ACAAACTGGA AATATTTGTC TCAGGCTATA 

251 AGATGAATCC GAAAATTGCC AAAAAATATA GTGTTTCGGT AAAGTTTGTC 

301 GATAAGGAAA AATCAAGGGC ATACAGGTTG GTCGGCGTTC CGAAGGCGGG 

351 GACGGGTTAT ACTTTGTCGG TATGGATGAA CAGCGTGGGC GACGGATACA 

401 AATGCCGTGA TGCCGCTTCT GCCCAAGCCC ATTTGGAGAC CTTGTCCTCA 

451 GATGTCGGCT GTGAAGCCTT CTCTAATCGT AAAAAATAA 

This corresponds to the amino acid sequence <SEQ ED 340; ORF89-l>: 

1 MMSNKMEQKG FTLIEMMIW AILGIISVIA IPSYQSYIEK GYQSQLYTEM 

51 VGINNISKQF ILKNPLDDNQ TIENKLE1FV SGYKMNPKIA KKYSVSVKFV 

101 DKEKSRAYRL VGVPKAGTGY TLSVWMNSVG DGYKCRDAAS AQAHLETLSS 

151 DVGCEAFSNR KK* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with PilE of N. gonorrhoeae (accession number Z69260). 
ORF89 and PilE protein show 30% aa identity in 120a overlap: 

orf89 8 QKGFTLIXXMIWAILGIISVIAIPSYXSYIEKGYQSQLYTEMXGINNISKQFILKNPL- 66 

QKGFTLI MIV+AI+GI++ +A+P+Y Y+ S+ G+ ++L++ 

PilE 5 QKGFTLIELMIVIAIVGILAAVALPAYQDYTARAQVSEAILLAEGQKSAVTEYYLNHGIW 64 

orf 8 9 67 -DDNQTIENKLEIFVSGYKMNPKIAKKYSVSVKFVDKEKSRAYRLVGVPKAGTGYTLSVW 125 

DN + +G + KI KY SV + GV K G LS+W 

PilE 65 PKDNTS AGVASSDKIKGKYVQSVTVAKGVVTAEMASTGVNKEIQGKKLSLW 115 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF89 shows 83.3% identity over a 162aa overlap with an ORF (ORF89a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 8 9 . pep MMSNXMXQKGFTLIXXMIWAILGIISVIAIPSYXSYIEKGYQSQLYTEMXGINNISKQF 

orf8 9a MMSNKMEQKGFTLIXXXXXXAIXXXXSVIXXXXYXSYIEKGYQSQLYTEIWGINNISKQX 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf 8 9 . pep I LP<NPLDDNQTIENKLE I FVSGYKMNPKIAKKYSVSVKFVDKEKSRAYRL VGVPKAGTGY 

orf 8 9a I LKNPLDDNQT IKSKLE I FVSGYKMNPKIAEKYNVS VHFVNEEKPRAYS LVGVPKTGTGY 

70 80 90 100 110 120 

130 140 150 160 

orf 8 9 . pep TLSVWMNSVGDGYKCRDAASAQAHLETLSSDVGCEAFSNRKKX 
U I I II I I I II I H I II I I ! I : I I I I I | | | | | | | | | | | | | | | | 
O r f 8 9 a TLSVWMN SVGDGYKCRDAAS ARAHLETL S S DVGCE AFSNRKKX 

130 140 150 160 

The complete length ORF89a nucleotide sequence <SEQ ID 34 1> is: 



1 AT GATGAGTA ATAAAATGGA ACAAAAAGGG T T T AC AT T G A TTGNGANGNT 
45 51 NATNGNCNTC GCGATACNCN GCNTTANCAG CGTCATTNCN ATNNNTNCNT 

ATCNNAGTTA TATTGAAAAA GGCTATCAGT CCCAGCTTTA TACGGAGATG 
GTCGGTATCA ACAATATTTC CAAACAGTNT ATTTTGAAAA ATCCCCTGGA 
CGATAATCAG AC CAT C AAG A GCAAACTGGA AATATTTGTC TCAGGCTATA 
i~>x AGATGAATCC GAAAATTGCC GAAAAATATA ATGTTTCGGT GCATTTTGTC 
50 301 AATGAGGAAA AACCNAGGGC ATACAGCTTG GTCGGCGTTC CAAAGACGGG 

351 GACGGGTTAT ACTTTGTCGG TATGGATGAA CAGCGTGGGC GACGGATACA 
4 01 AATGCCGTGA TGCCGCTTCT GCCCGAGCCC ATTTGGAGAC CTTGTCCTCA 
451 GATGTCGGCT GTGAAGCCTT CTCTAATCGT AAAAAATAG 

This encodes a protein having amino acid sequence <SEQ ID 342>: 

1 MMSNKMEQKG FTLIXXXXXX AIXXXXSVIX XXXYXSYIEK GYQSQLYTEM 



101 
151 
201 
251 
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51 VGINNISKQX ILKNPLDDNQ TIKSKLEIFV SGYKMNPKIA EKYNVSVHFV 
101 NEEKPRAYSL VGVPKTGTGY TLSVWMNSVG DGYKCRDAAS ARAHLETLSS 
151 DVGCEAFSNR KK* 

ORF89a and ORF89-1 show 83.3% identity in 162 aa overlap: 

5 10 20 30 40 50 60 

orf8 9a pep MMSNKMEQKGFTLIXXXXXXAIXXXXSVIXXXXYXSYIEKGYQSQLYTEMVGINNISKQX 

| | | | | | | | | | | | | | II 111 I I I I I I I I I I I I I I I I I I I I I I I I I 

orf8 9-l MMSNKMEQKGFTLIEMMIWAILGIISVIAIPSYQSYIEKGYQSQLYTEMVGINNISKQF 
10 20 30 40 50 60 

^ 70 80 90 100 110 120 

orf89a pep ILKNPLDDNQTIKSKLEIFVSGYKMNPKIAEKYNVSVHFVNEEKPRAYSLVGVPKTGTGY 
| | | | I I I I I I M :: I I I I I I I I I I I I I I I I : I I : I I I : I I I I I I I I 1 I I 1 I = I I I I 
orf 89-1 ILKNPLDDNQTIENKLEIFVSGYKMNPKIAKKYSVSVKFVDKEKSRAYRLVGVPKAGTGY 

15 -70 80 90 100 110 120 

130 140 150 160 

O r f 8 9 a . pep T L S VWMNS VGDG YKCRDAAS ARAHLET L S S DVGCE AFSNRKKX 
II I I I I I I I I I I I I I I I I M I : I I I I I I I II II I I I I I I I I I I 
20 orf89-l TLSVWMNSVGDGYKCRDAASAQAHLETLSSDVGCEAFSNRKKX 

130 140 150 160 

Homology with a predicted ORF from N.gonorrhoeae 

ORF89 shows 84.6% identity over a 162aa overlap with a predicted ORF (ORF89.ng) from N. 
25 gonorrhoeae: 

orf 8 9 MMSNXMXQKGFTLIXXMIWAILGIISVIAIPSYXSYIEKGYQSQLYTEMXGINNISKQF 60 

INI I I I I I I I I I I I I : I I I I I M I I I I I I I I I I I I I I I I I I I I I MM: Ml 
orf8 9ng MMSNKMEQKGFTLIEMMIWTILGIISVIAIPSYQSYIEKGYQSQLYTEMVGINNVLKQF 60 

30 orf89 ILKNPLDDNQTIENKLEIFVSGYKMNPKIAKKYSVSVKFVDKEKSRAYRLVGVPKAGTGY 120 

I I : I : : : I I : I II I 1 II I I I I I I II II I I I : II I II I I I I I I I I I : I I II I 

orf 8 9ng ILKNPQDDNDTLKSKLKIFVSGYKMNPKIAKKYSVSVRFVDAEKPRAYRLVGVPNAGTGY 120 

orf 8 9 TLSVWMNSVGDGYKCRDAASAQAHLETLSSDVGCEAFSNRKK 162 

35 I I I I II I I I I I I I I I I M : I I I I : : I I I : I I I I I I I 1 I I I 

orf9 9ng TLSVWMN S VGDGYKCRDAT S AQAY S DT LS AD S GCE AFSNRKK 162 

The complete length ORF89ng nucleotide sequence <SEQ ID 343> is: 

1 aTGATGAGCA ATAAAATGGA ACAAAAAGGG TTTACATTGA TTGAGATGAT 

51 GATAGTTGTC ACGAT ACT CG G CAT CAT CAG CGTCATTGCC ATACCTTCTT 

40 101 ATCAGAGTTA TATTGAAAAA GGCTATCAGT CCCAGCTTTA TACGGAGATG 

151 GTCGGTATCA ACAATGTTCT CAAACAGTTT ATTTTGAAAA ATCCCCAGGA 

201 CGATAATGAT ACCCTCAAGA GCAAACTGAA AATATTTGTC TCAGGCTATA 

251 AGATGAATCC GAAAAttgCC AAAAAATATA GTGTTTCGGt aaggtttGTC 

3 01 gatGCGGAAA AACCAAGGGC ATACAGGTTG GTCGGCGTTC CGAACGCGGG 
45 3 51 GACGGGTTAT ACTTTGTCGG TATGGATGAA CAGCGTGGGC GACGGATACA 

4 01 AATGCCGTGA TGCCACTTCT GCCCAGGCCT ATTCGGACAC CTTGTCCGCA 
451 GATAGCGGCT GTGAAGCTTT CTCTAATCGT AAAAAATAG 

This encodes a protein having amino acid sequence <SEQ ID 344>: 

1 MMSNKMEQKG FT LI EMM I VV TILGIISVIA IPSYQSYIEK GYQSQLYTEM 
50 51 VGINNVLKQF ILKNPQDDND TLKSKLKIFV SGYKMNPKIA KKYSVSVRFV 

101 DAEKPRAYRL VGVPNAGTGY TLSVWMNSVG DGYKCRDATS AQAYSDTLSA 
151 DSGCEAFSNR KK* 

This gonococcal protein has a putative leader peptide (underlined) and N-terminal methylation site 
(NMePhe or type-4 pili, double-underlined). In addition, ORF89ng and ORF89-1 show 88.3% 
55 identity in 162 aa overlap: 
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10 20 30 40 50 60 

orf8 9-l pep MMSNKMEQKGFTLIEMMIWAILGIISVIAIPSYQSYIEKGYQSQLYTEMVGINNISKQF 

| M M | I I I I I I I I I I I I I I :! I I M I I I I I I I I I I I I I I I 1 I I I I I I I I : IN 

orfR9na MMSNKMEQKGFTLIEMMIVVTILGIISVIAIPSYQSYIEKGYQSQLYTEMVGINNVLKQF 
^ 10 20 30 40 50 60 

70 80 90 100 110 120 

orf 8 9-1 pep ILKNPLDDNQTIENKLEIFVSGYKMNPKIAKKYSVSVKFVDKEKSRAYRLVGVPKAGTGY 
HIM | | | : 1 ::: | | : I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I : I I I I I 
orf 8 9na ILKNPQDDNDTLKSKLKIFVSGYKMNPKIAKKYSVSVRFVDAEKPRAYRLVGVPNAGTGY 
y 70 80 90 100 110 120 

130 140 150 160 

orf89-l pep TLSVWMNSVGDGYKCRDAASAQAHLETLSSDVGCEAFSNRKKX 
I | | | I I I I I I I I I I I I I I : I I I I : : I I I : I I I I M I I I I M 
orf89ng TLSVWMNSVGDGYKCRDATSAQAYSDTLSADSGCEAFSNRKKX 
130 140 150 160 

Based on this analysis, including the gonococcal motifs and the homology with the known PilE 
protein, it was predicted that these proteins from N. meningitidis and N. gonorrhoeae, and their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF89-1 (13.6kDa) was cloned in the pGex vector and expressed in E.coli, as described above. 
The products of protein expression and purification were analyzed by SDS-PAGE. Figure 11A 
shows the results of affinity purification of the GST-fusion protein. Purified GST-fusion protein 
was used to immunise mice, whose sera gave a positive result in the ELISA test., confirming that 
ORF89-1 is a surface-exposed protein, and that it is a useful immunogen. 

Example 41 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 345>: 

1 ATGAAAAAAT CCTCCCTCAT CAGCGCATTG GGCATCGGTA TTTTGAGCAT 

51 CGGCATGGCA TTTGCCGCCC CTGCCGACGC GGTAAGCCAA ATCCGTCAAA 

101 ACGCCACTCA AGTATTGAGC ATCTTAAAAA ACGGCGATGC CAACACCGCT 

151 CGCCAAAAAG CCGAAGCCTA TGCGATTCCC TATTTCGATT TCCAACGTAT 

2 01 GACCGCATTG GCGGTCGGCA ACCCTTGGsG CACCG.GTCC GACG . GCAAA 

251 AACAAGCGTT GGCCn.AGAA TTTCAACCC . . . 

This corresponds to the amino acid sequence <SEQ ID 346; ORF91>: 

1 MKKSSLISAL GIGILSIGMA FAAPADAVSQ IRQNATQVLS ILKNGDANTA 
51 RQKAEAYAIP YFDFQRMTAL AVGNPWXTXS DXQKQALAXE FQP. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 347>: 

1 ATGAAAAAAT CCTCCCTCAT CAGCGCATTG GGCATCGGTA TTTTGAGCAT 

51 CGGCATGGCA TTTGCCGCCC CTGCCGACGC GGTAAGCCAA ATCCGTCAAA 

101 ACGCCACTCA AGTATTGAGC ATCTTAAAAA ACGGCGATGC CAACACCGCT 

151 CGCCAAAAAG CCGAAGCCTA TGCGATTCCC TATTTCGATT TCCAACGTAT 

2 01 GACCGCATTG GCGGTCGGCA ACCCTTGGCG CACCGCGTCC GACGCGCAAA 

251 AACAAGCGTT GGCCAAAGAA TTTCAAACCC TGCTGATCCG CACCTATTCC 

301 GGCACGATGC TGAAATTAAA AAACGCCAAC GTCAACGTCA AAGACAATCC 

351 CATCGTCAAT AAAGGCGGCA AAG AAAT CAT CGTCCGCGCC GAAGTCGGCG 

4 01 TACCCGGGCA AAAACCCGTC AAC AT GGACT TCACCACCTA CCAAAGCGGC 

4 51 GGTAAATACC GTACCTACAA CGTCGCCATC GAAGGCGCGA GCCTGGTTAC 

501 CGTGTACCGC AACCAATTCG GCG AAAT TAT CAAAGCGAAA GGCGTGGACG 

551 GACTGATTGC CGAGTTGAAA GCCAAAAACG GCGGCAAATA A 
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This corresponds to the amino acid sequence <SEQ ID 348; ORF91-l>: 

1 MK KSSLISAL GIGILSIGMA FAA PADAVSQ IRQNATQVLS ILKNGDANTA 

51 RQKAEAYAIP YFDFQRMTAL AVGNPWRTAS DAQKQALAKE FQTLLIRTYS 

101 GTMLKLKNAN VNVKDNPIVN KGGKEIIVRA EVGVPGQKPV NMDFTTYQSG 

151 GKYRTYNVAI EGAS LVTVYR NQFGEIIKAK GVDGLIAELK AKNGGK* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF91 shows 92.4% identity over a 92aa overlap with an ORF (ORF91a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf91 pep MKKSSLISALGIGILSIGMAFAAPADAVSQIRQNATQVLSILKNGDANTARQKAEAYAIP 

I | | ] | : | | I I I M I I I I I I II I I I I I 1 I : I I I 1 I I I M : I I I I I I I I I I I I I ! I I 

orf 91a MKKSSFISALGIGILSIGMAFAAPADAVNQIRQNATQVLSILKSGDANTARQKAEAYAIP 



YFDFQRMTALAVGNPWXTXSDXQKQALAXEFQP 



orf 91a KGGKEIIVRAEVGVPGQKPVNMDFTTYQSGGKYRTYNVAIEGASLVTVYRNQFGEIIKAK 
130 140 150 160 170 180 

The complete length ORF9 la nucleotide sequence <SEQ ID 349> is: 

25 1 ATGAAAAAAT CCTCCTTCAT CAGCGCATTG GGCATCGGTA TTTTGAGCAT 

51 CGGCATGGCA TTTGCCGCCC CTGCCGACGC GGTAAACCAA ATCCGTCAAA 

101 ACGCCACTCA AGTATTGAGC ATCTTAAAAA GCGGT GATGC CAACACCGCC 

151 CGCCAAAAAG CCGAAGCCTA TGCGATTCCC TATTTCGATT TCCAACGTAT 

201 GACCGCATTG GCGGTCGGCA ACCCTTGGCG CACCGCGTCC GACGCGCAAA 

30 251 AACAAGCGTT GGCCAAAGAA TTTCAAACCC TGCTGATCCG CACCTATTCC 

3 01 GGCACGATGC TGAAATTAAA AAACGCCAAC GTCAACGTCA AAGACAATCC 
351 CATCGTCAAT AAAGGCGGCA AAG AAAT CAT CGTCCGCGCC GAAGTCGGCG 

4 01 TACCCGGGCA AAAACCCGTC AAC AT GGACT TCACCACCTA CCAAAGCGGC 
4 51 GGTAAATACC GTACCTACAA CGTCGCCATC GAAGGCGCGA GCCTGGTTAC 

35 501 CGTGTACCGC AACCAATTCG GCGAAATTAT CAAAGCGAAA GGCGTGGACG 

551 GACTGATTGC CGAGTTGAAG GCTAAAAACG GCAGCAAGTA A 

This encodes a protein having amino acid sequence <SEQ ID 350>: 

1 MKKSSFISAL GIGILSIGMA FA APADAVNQ IRQNATQVLS ILKSGDANTA 

51 RQKAEAYAIP YFDFQRMTAL AVGNPWRTAS DAQKQALAKE FQTLLIRTYS 

40 101 GTMLKLKNAN VNVKDNPIVN KGGKEIIVRA EVGVPGQKPV NMDFTTYQSG 

151 GKYRTYNVAI EGAS LVTVYR NQFGEIIKAK GVDGLIAELK AKNGSK* 

ORF91a and ORF91-1 show 98.0% identity in 196 aa overlap: 

10 20 30 40 50 60 

orf91a.pep MKKSSFISALGIGILSI GMAFAAPADAVNQ IRQNATQVLS I LKS GDANT ARQKAE AYAI P 
45 I I I I I : I I I I I I I I I I I I I I I I I I I I I I : I I I I ! I II II I I I I : I I I I I I I I I I I II I I I 

orf 91-1 MKKSSLISALGIGILSIGMAFAAPADAVSQIRQNATQVLSILKNGDANTARQKAEAYAIP 

10 20 30 40 50 60 

70 80 90 100 110 120 

50 orf 91a . pep Y FD FQRMTALAVGNPWRT AS DAQKQALAKE FQT LL IRT YS GTMLKLKNAN VNVKDN P I VN 

I I I I I I I I I I I I I M I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
or f 9 1 - 1 Y FD FQRMTALAVGNPWRT AS DAQKQALAKE FQT LL I RT Y S GTMLKLKNAN VNVKDN P I VN 

70 80 90 100 110 120 
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I M | l | | | M I I I I I I I I I I I I I 1 I I I I I I I I I I M I I I I I I I I I I I I 

OT . f qi_l KGGKEIIVRAEVGVPGQKPVNMDFTTYQSGGKYRTYNVAIEGA3LVTVYRNQFGEIIKAK 
130 140 150 160 170 180 

190 

orf91a.pep GVDGLIAELKAKNGSKX 
I II I i I II M 1 I I I : M 
or f 9 1-1 GVDGLIAELKAKNGGKX 
190 

Homology with a predicted ORF from N. gonorrhoeae 

ORF91 shows 84.8% identity over a 92aa overlap with a predicted ORF (ORF91.ng) from N. 
gonorrhoeae: 

orf91 pep MKKSSLISALGIGILSIGMAFAAPADAVSQIRQNATQVLSILKNGDANTARQKAEAYAIP 60 

: i I I I : I I I I I I M I I I : I I I I I : I I I I : I I I : 1 I I : I I I I I I I I : I 

orf91ng VKKSSFISALGIGILSIGMAFASPADAVGQIRQNATQVLTILKSGDAASARPKAEAYAVP 60 

orf91.pep YFDFQRMTALAVGNPWXTXSDXQKQALAXEFQP 93 

I I I I I I M I i I II I I I I II I I I I I I III 
orf 91ng YFDFQRMTALAVGNPWRTASDAQKQALAKEFQTLLIRTYSGTMLKFKNATVNVKDNPIVN 120 

The complete length ORF91ng nucleotide sequence <SEQ ID 35 1> is predicted to encode a protein 



having amino acid sequence <SEQ ID 352>: 



1 VKKSSFISAL GIGILSIGMA FA S PADAVGQ IRQNATQVLT ILKSGDAASA 

51 RPKAEAYAVP YFDFQRMTAL AVGNPWRTAS DAQKQALAKE FQTLLIRTYS 

25 101 GTMLKFKNAT VNVKDNPIVN KGGKEIWRA EVGIPGQKPV NMDFTTYQSG 

151 GKYRTYNVAI EGTSLVTVYR NQFGEI IKAK GIDGLIAELK AKNGGK* 

Further work revealed the complete nucleotide sequence <SEQ ID 353>: 

1 ATGAAAAAAT CCTCCTTCAT CAGCGCATTG GGCATCGGTA TTTTGAGCAT 

51 CGGCATGGCA TTTGCCTCCC CGGCCGACGC AGTGGGACAA ATCCGCCAAA 

30 101 ACGCCACACA GGTTTTGACC ATCCTCAAAA GCGGCGACGC GGCTTCTGCA 

151 CGCCCAAAAG CCGAAGCCTA TGCGGTTCCC TATTTCGATT TCCAACGTAT 

201 GACCGCATTG GCGGTCGGCA ACCCTTGGCG TACCGCGTCC GACGCGCAAA 

251 AACAAGCGTT GGCCAAAGAA TTTCAAACCC TGCTGATCCG CACCTATTCC 

301 GGCACGATGC TGAAATTCAA AAACGCGACC GTCAACGTCA AAGACAATCC 

35 351 CATCGTCAAT AAGGGCGGCA AGGAAATCGT CGTCCGTGCC GAAGTCGGCA 

401 TCCCCGGTCA GAAGCCCGTC AATATGGACT TTACCACCTA CCAAAGCGGC 

451 GGCAAATACC GTACCTACAA CGTCGCCATC GAAGGCACGA GCCTGGTTAC 

501 CGTGTACCGC AACCAATTCG GCGAAAT CAT CAAAG C C AAA GGCATCGACG 

551 GGCTGATTGC CGAGTTGAAA GCCAAAAACG GCGGCAAATA A 

40 This corresponds to the amino acid sequence <SEQ ID 354; ORF91ng-l>: 

1 MKKSSFISAL GIGILSIGMA FA S PADAVGQ IRQNATQVLT ILKSGDAASA 

51 RPKAEAYAVP YFDFQRMTAL AVGNPWRTAS DAQKQALAKE FQTLLIRTYS 

101 GTMLKFKNAT VNVKDNPIVN KGGKEIWRA EVGIPGQKPV NMDFTTYQSG 

151 GKYRTYNVAI EGTSLVTVYR NQFGEI IKAK GIDGLIAELK AKNGGK* 

45 ORF91ng-l and ORF91-1 show 92.3% identity in 196 aa overlap: 

10 20 30 40 50 60 

orf 91-1 . pep MKKSSLISALGIGILSIGMAFAAPADAVSQIRQNATQVLSILKNGDANTARQKAEAYAIP 
I I I I I : I I I I I I I I I I I I I I I I : I II I I : I I I I I I I I I I : I I I : I I I : I I I I I I I I : I 
orf 91ng-l MKKSSFISALGIGILSIGMAFASPADAVGQIRQNATQVLTILKSGDAASARPKAEAYAVP 
50 10 20 30 40 50 60 

70 80 90 100 110 120 

orf 91-1 . pep YFDFQRMTALAVGNPWRTAS DAQKQALAKE FQTLLIRTYSGTMLKLKNANVNVKDNP I VN 
I I II I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I : I I I = I I II I I I I I I 
55 orf 91ng-l YFDFQRMTALAVGNPWRTASDAQKQALAKEFQTLLIRTYSGTMLKFKNATVNVKDNPIVN 

70 80 90 100 110 120 
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130 140 150 160 170 180 

or f 9 1- 1 pep KGGKE I IVRAEVGVPGQKPVNMDFTTYQSGGKYRT YNVAIEGASLVTVYRNQFGE I IKAK 

| | | | | | : | | | M I : I I I II I I I I I I I M I ! I I I I I I I : I II I I I I I I 1 M I 1 1 I I 

orf91ncf-l KGGKEIWRAEVGIPGQKPVNMDFTTYQSGGKYRTYNVAIEGTSLVTVYRNQFGEIIKAK 
130 140 150 160 170 180 

190 

orf 91-1 . pep GVDGLIAELKAKNGGKX 
I : I I I I II I I I I I M I ! 

orf 91ng-l 

In addition, ORF91ng-l shows homology to a hypothetical E.coli protein: 

sp|P45390|YRBC_ECOLI HYPOTHETICAL 24.0 KD PROTEIN IN MURA-RPON INTERGENIC 

REGION PRECURSOR (F211) >gi 1 606130 (U18997) ORF_f211 [Escherichia coli] 

>gi | 1789583 (AE000399) hypothetical 24.0 kD protein in murZ-rpoN intergenic 

region [Escherichia coli] Length = 211 

Score =70.6 bits (170), Expect = 6e-12 

Identities = 42/137 (30%), Positives = 76/137 (54%), Gaps = 6/137 (4%) 

Query: 59 VPYFDFQRMTALAVGNPWRTASDAQKQALAKEFQTLLIRTYSGTMLKFKNATVNVKDNPI 118 

+PY + AL +G +++A+ AQ++A F+L + Y + + T + P 

Sbjct: 65 LPYVQVKYAGALVLGQYYKSATPAQREAYFAAFREYLKQAYGQALAMYHGQTYQIA— PE 122 

Query: 119 VNKGGKEIV-VRAEVGI P-GQKPVNMD FTTYQS G — GKYRT YNVAI EGT S LVTV YRNQ FG 174 

G K IV +R + P G+ PV +DF ++ G ++ Y++ EG S++T +N++G 
Sbjct: 123 QPLGDKTIVPIRVTIIDPNGRPPVRLDFQWRKNSQTGNWQAYDMIAEGVSMITTKQNEWG 182 

Query: 175 EIIKAKGIDGLIAELKA 191 

+++ KGIDGL A+LK+ 
Sbjct: 183 TLLRTKGIDGLTAQLKS 199 

Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 42 

The following DNA sequence was identified in N. meningitidis <SEQ ID 355>: 

1 AT GAAACACA TACTCCCCCT GATTGCCGCA TCCGCACTCT GCATTTCAAC 
51 CGCTTCGGCA CATCCTGCCA GCGAACCGTC C ACT CAAAAC GAAACCGCTA 

101 TGATCACGCA TACCCTCATC TCAAAATACA GTTTTGGnnn nnnnnnnnnn 

151 nnnnnnnnnn nnGCCATAAA AAGCAAAGGG ATGGAC ATT T TTGCCGTCAT 

2 01 CGACCATCAG GAAGCCGCAC GCCGAAACGG CTTAACGATG CAGCCGGCAA 

251 AAGTCATCGT CTTCGGCACG CCCAAAGCCG GCACGCCGCT GATGGTCAAA 

301 GACCCCGCCT TCGCCCTGCA ACTGCCCCTA CGCGTCCTCG TTACCGAAAC 

351 GGACGGCAAA GTACGCGCCG CCTATACCGA TACGCGCGCC CTCATCGCCG 

4 01 GCAGCCGCAT CGGTTTCGAC GAAGTGGCAA ACACTTTGGC AAACGCCGAA 

451 AAACTGATAC AAAAAACCGT AGGCGAATAA 

This corresponds to the amino acid sequence <SEQ ED 356; ORF97>: 

1 MKHILPLIAA SALCISTASA HPASEPSTQN ETAMITHTLI SKYSFGXXXX 
51 XXXXAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 
101 DPAFALQLPL RVLVTETDGK VRAAYTDTRA LIAGSRIGFD EVANTLANAE 
151 KLIQKTVGE* 

Further work revealed the complete nucleotide sequence <SEQ ED 357>: 



55 



1 AT GAAACACA TACTCCCCCT GATTGCCGCA TCCGCACTCT GCATTTCAAC 
51 CGCTTCGGCA CATCCTGCCA GCGAACCGTC CACCCAAAAC GAAACCGCTA 
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101 TGACCACGCA TACCCTCACC T CAAAAT AC A GTTTTGACGA AACCGT CAGC 

151 CGCCTTGAAA CCGCCATAAA AAGCAAAGGG ATGGACATTT TTGCCGTCAT 

201 CGACCATCAG GAAGCCGCCC GCCGAAACGG CTTAACGATG CAGCCGGCAA 

251 AAGTCATCGT CTTCGGCACG CCCAAAGCCG GCACGCCGCT GATGGTCAAA 

5 301 GACCCCGCCT TCGCCCTGCA ACTGCCCCTA CGCGTCCTCG TTACCGAAAC 

351 GGACGGCAAA GTACGCGCCG CCTATACCGA TACGCGCGCC CTCATCGCCG 

4 01 GCAGCCGCAT CGGTTTCGAC GAAGTGGCAA ACACTTTGGC AAACGCCGAA 

451 AAACTGATAC AAAAAACCGT AGGCGAATAA 

This corresponds to the amino acid sequence <SEQ ID 358; ORF97-l>: 

10 1 MKHILPLIAA SALCISTASA HPASEPSTQN ETAMTTHTLT SKYSFDETVS 

51 RLETAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 
101 DPAFALQLPL RVLVTETDGK VRAAYTDTRA LIAGSRIGFD EVANTLANAE 
151 KLIQKTVGE* 

Computer analysis of this amino acid sequence gave the following results: 
15 Homology with a predicted ORF from N. meningitidis (strain A) 

ORF97 shows 88.7% identity over a 159aa overlap with an ORF (ORF97a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 97 pep MKHILPLIAASALCISTASAHPASEPSTQNETAMITHTLISKYSFGXXXXXXXXAIKSKG 
20 Mill! I I I I I I 1 I M I I I I I I : I ! I I I I I I 1 I I Mill : : I I I I M 

orf 97a MXHILPLXXASALCISTASXHPASEPQTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 
10 20 30 40 50 60 

70 80 90 100 110 120 

25 orf97 pep MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 

I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I II M I I I I I I I I I I ! I I I 
orf 97a MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVXVTETDGK 

70 80 90 100 110 120 

30 130 140 150 160 

orf 97 .pep VRAAYT DT RAL I AG SR I G FDEVANT LANAEKL I QKTVGEX 

II I I I M II II I I I I I M I I I II I I I I I I I I I I I I I : I M 
o r f 9 7 a VRAAYT DTRAL I AG S RI G FDEVANT LANAEKLI QKT I GEX 

130 140 150 160 

35 The complete length ORF97a nucleotide sequence <SEQ ID 359> is: 

1 AT GANACACA TACTCCCCCT GANTGNCGCA TCCGCACTCT GCATTTCAAC 

51 CGCTTCGGNN CATCCTGCCA GCGAACCGCA AACCCAAAAC GAAACCGCTA 

101 TGACCACGCA TACCCTCACC TCAAAATACA GTTTTGACGA AACCGT CAGC 

151 CGCCTTGAAA CCGCCATAAA AAGCAAAGGG ATGGACATTT TTGCCGTCAT 

40 201 CGACCATCAG GAAGCCGCCC GCCGAAACGG CTTAACGATG CAGCCGGCAA 

251 AAGTCATCGT CTTCGGCACG CCCAAAGCCG GTACGCCGCT GATGGTCAAA 

3 01 GACCCCGCCT TCGCCCTGCA ACTGCCCCTG CGCGTCNTCG TTACCGAAAC 
351 GGACGGCAAA GTACGCGCCG CCTATACCGA TACGCGCGCC CTCATCGCCG 

4 01 GCAGCCGCAT CGGTTTCGAC GAAGTGGCAA ACACTTTGGC AAACGCCGAA 
45 4 51 AAACTGATAC AAAAAACCAT AGGCGAATAA 

This encodes a protein having amino acid sequence <SEQ ID 360>: 

1 MXHILPLXXA SALCISTASX HPASEPQTQN ETAMTTHTLT SKYSFDETVS 

51 RLETAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGTPLMVK 

101 DPAFALQLPL RVXVTETDGK VRAAYTDTRA LIAGSRIGFD EVANTLANAE 

50 151 KLIQKTIGE* 

ORF97a and ORF97-1 show 95.6% identity in 159 aa overlap: 

10 20 30 40 50 60 

orf 97a . pep MXHILPLXXASALCISTASXHPASEPQTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 

55 orf 97-1 MKHILPLIAASALCISTASAHPASEPSTQNETAMTTHTLTSKYSFDETVSRLE TAIKSKG 
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10 20 30 40 50 60 

70 80 90 100 110 120 

MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVXVTETDGK 

I I I I I I I I I I I M I I I I I I I I M I I I I I I I I M I I I I I I I N I I I I I I I I I I I I I I I I I 

MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 
70 80 90 100 110 120 

130 140 150 160 

VRAAYTDTRALIAGSRIGFDEVANTLANAEKLIQKTIGEX 
| | | | I I I I I I I I I I I I I I I I I I I I I M I I I i M I I I : I I I 
VRAAYTDTRAL I AGSRI GFDEVANTLANAEKL IQKTVGEX 

130 140 150 160 



Homology with a predicted ORF from N. gonorrhoeae 

ORF97 shows 88.1% identity over a 159aa overlap with a predicted ORF (ORF97.ng) from N. 
gonorrhoeae: 

orf97 pep MKHILPLIAASALCISTASAHPASEPSTQNETAMITHTLISKYSFGXXXXXXXXAIKSKG 60 

I I I I I I I I I I I : I I I II I I II I : : I I I I II I I I I I I I I I I I ■ : M I I I I 

orf97ng MKHILPPIAASAFCISTASAHPAGKPPTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 60 

orf97 pep MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 120 

I I I I I I I I i I I I II I I I I I I I I I 1 I 1 I I I I I I I I I I M I I 

orf 97ng MDIFAVIDHQEAARRNGLTMQPAKVIVFGTPKAGTPLMVKDPAFALQLPLRVLVTETDGK 120 

orf97.pep VRAAYTDTRALIAGSRIGFDEVANTLANAEKLIQKTVGE 159 

I I : I I I I I I I I I : I I I i : II I II 1 I I I I I I I I I I II I II 
orf97ng VRT AYT DT RALI VG SR I S FDEVANTLANAEKL I QKTVGE 159 

The complete length ORF97ng nucleotide sequence <SEQ ID 361> is predicted to encode a protein 
having amino acid sequence <SEQ ID 362>: 

1 MKHILPPIAA SAFCISTASA HPAGKPPTQN ETAMTTHTLT SKYSFDETVS 
51 RLETAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGT PLMVK 
101 DPAFALQLPL RVLVTETDGK VRTAYTDTRA LIVGSRISFD EVANTLANAE 
151 KL I QKTVGE* 

Further work revealed the complete nucleotide sequence <SEQ ID 363>: 



1 AT G AAAC AC A TACTCCCcct gatcgccgca TccgcactCT GCATTTCAAC 

51 CGCTT CGGCA CACCCTGCCG GCAAACCGCC CACCCAAAAC GAAACCGCTA 

101 TGACCACGCA CACCCTCACC TCGAAATACA GTTTTGACGA AACCGTCAGC 

151 CGCCTTGAAA CCGCCATAAA AAGCAAAGGG ATGGACATTT TTGCCGTCAT 

201 CGACCATCAG GAAGCGGCAC GCCGAAACGG CCTGACCATG CAGCCGGCAA 

251 AAGTCATCGT CTTCGGCACG CCCAAGGCCG GTACGCCgct GATGGTCAAA 

301 GACCCCGCCT TCGCCCTGCA ACTGCCCCTG CGCGTCCTCG T T AC CG AAAC 

351 GGACGGCAAA GTACGCACCG CCTATACCGA TACGCGCGCC CTCATCGTCG 

4 01 GCAGCCGCAT CAGTTTCGAC GAAGTGGCAA ACACTTTGGC AAACGC CGAA 

451 AAACTGATAC AAAAAACCGT AGGCGAATAA 

This corresponds to the amino acid sequence <SEQ ID 364; ORF97ng-l>: 

1 MKHILPLIAA SALCISTASA HPAGKPPTQN ETAMTTHTLT SKYSFDETVS 

51 RLETAIKSKG MDIFAVIDHQ EAARRNGLTM QPAKVIVFGT PKAGT PLMVK 

101 DPAFALQLPL RVLVTETDGK VRTAYTDTRA LIVGSRISFD EVANTLANAE 

151 KL I QKTVGE* 

ORF97ng-l and ORF97-1 show 96.2% identity in 159 aa overlap: 

10 20 30 40 50 60 

orf 97-1 .pep MKHILPLIAASALCISTASAHPASEPSTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 
I I I I I I I I I II II I I I I I II I I I : : I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 97ng-l MKHILPLIAASALCISTASAHPAGKPPTQNETAMTTHTLTSKYSFDETVSRLETAIKSKG 

10 20 30 40 50 60 



CHIR-0160 (356.001) 



-249- 



PATENT 



70 80 90 100 110 120 

orf 97-1 pep MDIFAVIDHQEAARRNGLTMQPAKVIVFGT PKAGTPLMVKDPAFALQLPLRVLVTETDGK 
| | | | | | | | | | | I I I 1 I 1 I M I I I I I I I I I I I I I I I I M M I II I I I I i I i I I I i M I I I I 
5 orf97na-l MDIFAVIDHQEAARRNGLTMQPAKVIVFGT PKAGTPLMVKDPAFALQLPLRVLVTETDGK 

70 80 90 100 110 120 

130 140 150 160 

orf97-l pep VRAAYTDTRAL I AGSRI GFDEVANTLANAEKL IQKTVGEX 
10 II = I I I I I I M I : I I I I : II I I 1 M I I I I I I M I 1 I I I I I 

orf 97ng-l VRTAYTDTRALIVGSRISFDEVANTLANAEKL IQKTVGEX 

130 140 150 160 

Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it was predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their 
1 5 epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



ORF97-1 (15.3kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figures 
12A & 12B show, repsectively, the results of affinity purification of the GST-fusion and His-fusion 
proteins. Purified GST-fusion protein was used to immunise mice, whose sera were used for 
20 Western Blot (Figure 12C), ELISA (positive result), and FACS analysis (Figure 12D). These 
experiments confirm that ORF97-1 is a surface-exposed protein, and that it is a useful immunogen. 



Figure 12E shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF97-1. 



Example 43 



The following DNA, believed to be complete, sequence was identified in N. meningitidis <SEQ LD 
25 365>: 



1 ATGGCTTTTA TTACGCGCTT ATTCAAAAGC AGTAAATGGC TGATTGTGCC 

51 GCTGATGCTC CCCGCCTTTC AGAATGTGGC GGCGGAGGGG ATAGATGTGA 

101 GCCGTGCCGA AGCGAGGATA ACCGACGGCG GGCAGCTTTC CATCAGCAGC 

151 CGCTTCCAAA CCGAGCTGCC CGACCAGCTC CAACAGGCGT TGCGCCGGGg 

30 2 01 CGTGCCGCTC AACTTTACCT TAAGCTGGCA GCTTTCCGCC CCGATAATCG 

2 51 CTTCTTATCG GTTTAAATTG GGGCAACTGA TTGGCGATGA CGACaATATT 

301 GACTACAAAC TGAGTTTCCA TCCGCTGACc AaACGCTACC GCGTTACCgT 

351 CGgCGCGTTT TCGACAGACT ACGACACCTT GGATGCGGCA TTGCGCGCGA 

4 01 CCGGCGCGGT TGCCAACTGG AAAGTCCTGA ACAAAGGCGC GCTGTCCGGT 

35 4 51 GCGGAAGCAG GGGAAACCAA GGCGGAAATC CGCCTGACGC TGTCCACTTC 

501 AAAACTGCCC AAGCCTTTTC AAATCAATGC ATTGACTTCT CAAAACTGGC 

551 ATTTGGATTC GGGTTGGAAA CCTCTAAACA TCATCGGGAA CAAATAA 

This corresponds to the amino acid sequence <SEQ ID 366; ORF106>: 

1 MAFITRLFKS SKWLIVPLML PAFQNVAAEG IDVSRAEARI TDGGQLSISS 

40 51 RFQTELPDQL QQALRRGVPL NFTLSWQLSA PIIASYRFKL GQLIGDDDNI 

101 DYKLSFHPLT KRYRVTVGAF STDYDTLDAA LRATGAVANW KVLNKGALSG 

151 AEAGETKAEI RLTLSTSKLP KPFQINALTS QNWHLDSGWK PLNIIGNK* 

Further work revealed the following DNA sequence <SEQ ID 367>: 



45 



1 ATGGCTTTTA TTACGCGCTT ATTCAAAAGC AGTAAATGGC TGATTGTGCC 
51 GCTGATGCTC CCCGCCTTTC AGAATGTGGC GGCGGAGGGG ATAGATGTGA 
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101 GCCGTGCCGA AGCGAGGATA ACCGACGGCG GGCAGCTTTC CATCAGCAGC 

151 CGCTTCCAAA CCGAGCTGCC CGACCAGCTC CAACAGGCGT TGCGCCGGGG 

201 CGTGCCGCTC AACTTTACCT TAAGCTGGCA GCTTTCCGCC CCGATAATCG 

251 CTTCTTATCG GTTTAAATTG GGGCAACTGA TTGGCGATGA CGACAATATT 

301 GACTACAAAC TGAGTTTCCA TCCGCTGACC AACCGCTACC GCGTTACCGT 

351 CGGCGCGTTT TCGACAGACT ACGACACCTT GGATGCGGCA TTGCGCGCGA 

4 01 CCGGCGCGGT TGCCAACTGG AAAGTCCTGA ACAAAGGCGC GCTGTCCGGT 

451 GCGGAAGCAG GGGAAACCAA GGCGGAAATC CGCCTGACGC TGTCCACTTC 

501 AAAACTGCCC AAGCCTTTTC AAATCAATGC ATTGACTTCT CAAAACTGGC 

551 ATTTGGATTC GGGTTGGAAA CCTCTAAACA TCATCGGGAA CAAATAA 

This corresponds to the amino acid sequence <SEQ ID 368; ORF106-1>: 

1 MAFITRLFKS SKWLIVPLML PAFQNVAAEG IDVSRAEARI TDGGQLSISS 

51 RFQTELPDQL QQALRRGVPL NFTLSWQLSA PIIASYRFKL GQLIGDDDNI 

101 DYKLSFHPLT AJRYRVTVGAF STDYDTLDAA LRATGAVANW KVLNKGALSG 

151 AEAGETKAEI RLTLSTSKLP KPFQINALTS QNWHLDSGWK PLNIIGNK* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF106 shows 87.4% identity over a 199aa overlap with an ORF (ORF106a) from strain A of N. 
meningitidis: 

10 20 30 40 50 59 

orfl06.pep MAFITRLFKSSK-WLIVPLMLPAFQNVAAEGIDVSRAEARITDGGQLSISSRFQTELPDQ 
I I 1 I I I I I I I I ||:: II : : : : I I I I I I I I I II I I I : I I I I I I II II I I I I I I 
orf 10 6a MAFITRLFKS IKQWLVLLPMLSVLPDAAAEGIDVSRAEARIXDGGQLSXXSRFQTELPDQ 

10 20 30 40 50 60 

60 70 80 90 100 110 119 

orf 106 . pep LQQALRRGVPLN FT L S WQL S AP 1 1 AS YRFKLGQL I GD DDN I DYKL S FHPLTKRYRVT VGA 



30 



120 130 140 150 160 170 179 

orf 106 . pep FSTDYDTLDAALRATGAVANWKVLNKGALSGAEAGETKAEIRLTLSTSKLPKPFQINALT 
III I I I II I I I I I I I I II I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I M I I I I 
35 orf 1 06a FSTXYDTLDAALRATGAVANWKVLNKGALSGAEAGETKAEIRLTLSTSKLPKPFQINALT 

130 140 150 160 170 180 

180 190 199 

orf 106. pep SQNWHLDSGWKPLNIIGNKX 
40 I II I I I I N II I I I I II II I 

orfl06a SQNWHLDSGWKPLNIIGNKX 
190 200 

Due to the K->N substitution at residue 1 1 1, the homology between ORF 106a and ORF 106-1 is 
87.9% over the same 199 aa overlap. 

45 The complete length ORF 1 06a nucleotide sequence <SEQ ID 369> is: 

1 ATGGCTTTTA TTACGCGCTT ATTCAAAAGC ATTAAACAAT GGCTTGTGCT 

51 GCTGCCGATG CTTTCCGTTT TGCCGGACGC GGCGGCGGAG GGGATAGATG 

101 TGAGCCGCGC CGAAGCGAGG ATAANCGACG GCGGGCAGCT TTCCATNAGN 

151 AGCCGCTTCC AAACCGAGCT GCCCGACCAG CTCCAANNNG CGNNGNGCCG 

50 201 GGGCGTGNCG CTCAACTNTA CCTTAAGNTG GCAGCTTTCC GCCCCGATAA 

251 TCGCTTCTTA TCGGTTTNAA TTGGGGCAAC TGATTGGCGA TGACGACNAT 

301 ATTGACTACA AACTGAGTTT CCATCCGCTG ACCAACCGCT ACCGCGTTAC 

351 CGTCGGCGCG TTTTCGACAG ANTACGACAC CTTGGATGCG GCATTGCGCG 

4 01 CGACCGGCGC GGTTGCCAAC TGGAAAGTCC TGAACAAAGG CGCGCTGTCC 

55 4 51 GGTGCGGAAG CAGGGGAAAC CAAGGCGGAA ATCCGCCTGA CGCTGTCCAC 

501 TTCAAAACTG CCCAAGCCTT TTCAAATCAA TGCATTGACT TCTCAAAACT 
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551 GGCATTTGGA TTCGGGTTGG AAACCTCTAA ACATCATCGG GAACAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 370>: 

1 MAFITRLFKS IKQWLVLLPM LSVLPDAAAE GIDVSRAEA R IXDGGQLSXX 

51 SRFQTELPDQ LQXAXXRGVX LNXTLXWQLS APIIASYRFX LGQLIGDDDX 

101 IDYKLSFHPL TNRYRVTVGA FSTXYDTLDA ALRATGAVAN WKVLNKGALS 

151 GAEAGETKAE IRLTLSTSKL PKPFQINALT SQNWHLDSGW KPLNIIGNK* 



Homology with a predicted ORF from N .gonorrhoeae 

ORF106 shows 90.5% identity over a 199aa overlap with a predicted ORF (ORF106.ng) fromA/". 



gonorrhoeae: 

orfl0 6.pep MAFITRLFKSSK-WLIVPLMLPAFQNVAAEGIDVSRAEARITDGGQLSISSRFQTELPDQ 5 9 

I I I I I I I I I I I ||:: : I : : ==11111 : : I I I I I I I I I I : I I ! I i I ! I I I I I I ! 

orfl0 6ng MAFITRLFKSIKQWLVLLPILSVLPDAAAEGIAATRAEARITDGGRLSISSRFQTELPDQ 60 

orfl06 pep LQQALRRGVPLNFTLSWQLSAPIIASYRFKLGQLIGDDDNIDYKLSFHPLTKRYRVTVGA 119 

I | | | | I I I I I I I I I I I I I I I I I I 1 ! I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I 
orfl0 6ng LQQALRRGVPLNFTLSWQLSAPTIASYRFKLGQLIGDDDNIDYKLSFHPLTNRYRVTVGA 120 

orf 10 6 .pep FSTDYDTLDAALRATGAVANWKVLNKGALSGAEAGETKAEIRLTLSTSKLPKPFQINALT 17 9 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I 1 I I ! I I I I I I I I I I I I I I I I I I M I I 
orfl0 6ng FSTDYDTLDAALRATGAVANWKVLNKGALSGAEAGETKAEIRLTLSTSKLPKPFQINALT 180 

orfl06.pep S QN WHL DSGWKPLNIIGNK 198 

I I I I I I I I I I I I I I I 1 11 I 
orfl06ng SQNWHLDSGWKPLNIIGNK 199 

Due to the K->N substitution at residue 1 1 1, the homology between ORF106ng and ORF 106-1 is 
91.0% over the same 199 aa overlap. 



The complete length ORF106ng nucleotide sequence <SEQ ID 371> is: 

1 ATGGCTTTTA TTACGCGCTT ATTCAAAAGC ATTAAACAAT GGCTTGTGCT 

51 GTTGCCGATA CTCTCCGTTT TGCCGGACGC GGCGGCGGAG GGCATTGCCG 

101 CGACCCGCGC CGAAGC GAGG ATAACCGACG GCGGGCGGCT TTCCATCAGC 

151 AGCCGCTTCC AAACCGAGCT GCCCGACCAG CTCCAACAGG CGTTGCGCCG 

201 GGGCGTACCG CTCAACTTTA CCTTAAGCTG GCAGCTTTCC GCCCCGACAA 

251 TCGCTTCTTA TCGGTTTAAA TTGGGGCAAC TGATTGGCGA TGACGACAAT 

3 01 ATTGACTACA AACTAAGTTT CCATCCGCTG ACCAACCGCT ACCGCGTTAC 
351 CGT CGGCGCA TTTTCCACCG ATTACGACAC TTTGGATGCG GCATTGCGCG 

4 01 CGACCGGCGC GGTTGCCAAC TGGAAAGTCC TGAACAAAGG CGCGTTGTCC 
4 51 GGTGCGGAAG CAGGGGAAAC CAAGGCGGAA ATCCGCCTGA CGCTGTCCAC 
501 TTCAAAACTG CCCAAGCCTT TCCAAATCAA CGCATTGACT TCTCAAAACT 
551 GGCATTTGGA TTCGGGTTGG AAACCTCTAA ACATCATCGG GAACAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 372>: 



1 MAFITRLFKS IKQWLVLLPI LSVLPDAAAE GIAATRAEA R ITDGGRLSIS 

51 SRFQTELPDQ LQQALRRGVP LNFTLSWQLS APTIASYRFK LGQLIGDDDN 

101 IDYKLSFHPL TNRYRVTVGA FSTDYDTLDA ALRATGAVAN WKVLNKGALS 

151 GAEAGETKAE IRLTLSTSKL PKPFQINALT SQNWHLDSGW KPLNIIGNK* 

Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it was predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their 



epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 
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ORF 106-1 (18kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
13A shows the results of affinity purification of the His-fusion protein, and Figure 13B shows the 
results of expression of the GST-fusion in E.coli. Purified His-fusion protein was used to immunise 
mice, whose sera were used for FACS analysis (Figure 13C) These experiments confirm that 
ORF 106-1 is a surface-exposed protein, and that it is a useful immunogen. 



Example 44 

The following DNA sequence, believed to be complete, was identified in N. meningitidis <SEQ ID 
373>: 

1 ATGGACACAA AAGAAATCCT CGG.TACGCG GcAGGcTCGA TCGGCAGCGC 

51 GGTTTTAGCC GT CAT CAT C c TGCCGCTGCT GTCGTGGTAT TTCCCCGCCG 

101 ACGACATCGG GCGCATCGTG CTGATGCAGA CGGCGGCGGG GCTgACGGTG 

151 TCGGTGTTGT GCCTCGGGCT GGATCAGGCA TACGTCCGCG AAT ACT AT G C 

201 CACCGCCGAC AAAGACAcCT TGTTCAAAAC CCTGTTCCTG CCGCCGCTGC 

251 TGTCTGCCGC CGCGATAGCC GCCCTGCTGC TTTCCCGCCC GTCCCTGCCG 

301 TCTGAAATCC TGTTTTCACT CGACGATGCC gCCGCCGGCa TCGGGCTGGT 

351 GCTGTTTGAA CtGAGCTTCC TGCCCATCCG cTTTCTCTTA CTGGTTTTGC 

401 GTATGGAAGG ACGCGCCcTT GCCTTTTCGT CCGCGCAACT CGTGCcCAAG 

451 CTCGCCATCC TGCTGCTG.T GCCGCT GACG GTCGGGCTGC TGCACTTTCC 

501 AGCGAACACC GCCGTCCTGA CCGCCGTTTA CGCGCTGGCA AACCTTGCCG 

551 CCGCCGCCTT TTTGCTGTTT CAAAACCGAT GCCGTCTGAA GGCCGTCCGG 

601 CACGCACCGT TTTCGCCCGC CGTCCTGCAC CGGGGG.TGC GCTACGGCAT 

651 ACCGATCGCA CTGAGCAGCA TCGCCTATTG GGGGCTGGCA TCCGCCGACC 

701 GTTTGTTCCT GAAAAAATAT GCCGGCCTGG AACAGCTCGG CGTTTATTCG 

751 ATGGGTATTT CGTTCGGCGG GGCGGCATTA TTGTTCCAAA GCATCTTTTC 

8 01 AACGGTCTGG ACACCGTATA TTTTCCGCGC AATCGAAGAA AACGCCCCGC 

851 CCGCTCGCCT CTCGGCAACG GCAGAATCCG CCGCCGCCCT GCTTGCCTCC 

901 GCCCTCTGC. TGACCGGCAT TTTCTCGCCC CTTGCCTCCC TCCTGCTGCC 

951 GGAAAACTAC GCCGCCGTCC GGTTTATCGT CGTATCGTGT ATG.TGCCGC 

10 01 CGCTGTTTTG CACGCTGGCG GAAATCAGCG GCATCGGTTT GAACGTCGTT 

1051 CGCAAAACGC GCCCGATCGC GCTCGCCACC TTGGGCGCGC TGGCGGCAAA 

1101 CCTGCTGCTG CTGGGGCTTG ACCGTGCCGT ACCGGCGAGG CCGCC.GGCG 

1151 CGGCGGTTGC CTGTGCCGCC TCATTCTGGC TGTTTTTTGC CTTCAAGACC 

12 01 GAAAGCTCyT GCCGCCTGTG GCAGCCGCTC AAACGCCTGC CGCTTTATCT 
1251 GCACACATTG TTCTGCCTGA CCTCCTCGGC GGCCTACACC TGCTTCGGCA 

13 01 CGCCGGCAAA CTATCCCCTG TTTGCCGGCG TATGGGCGGC ATATCTGGCA 
1351 GGCTGCATCC TGCGCCACCG GAAAGATTTG CACAAACTGT TTCATTATTT 

14 01 GAAAAAACAA GGTTTCCCAT TATGA 

This corresponds to the amino acid sequence <SEQ ID 374; ORF10>: 

1 MDTKEILXYA AGSIGSAVLA VIILPLLSWY FPADDIGRIV LMQTAAGLTV 

51 SVLCLGLDQA YVREYYATAD KDTLFKTLFL PPLLSAAAIA ALLLSRPSLP 

101 SEILFSLDDA AAGIGLVLFE LSFLPIRFLL LVLRMEGRAL AFSSAQLVPK 

151 LAILLLXPLT VGLLHFPANT AVLTAVYALA NLAAAAFLLF QNRCRLKAVR 

201 HAPFSPAVLH RGXRYGIPIA LSSIAYWGLA SADRLFLKKY AGLEQLGVYS 

251 MGISFGGAAL LFQSIFSTVW TPYIFRAIEE NAP PARL SAT AESAAALLAS 

301 ALCXTGIFSP LASLLLPENY AAVRFIWSC MXPPLFCTLA EISGIGLNW 

351 RKTRPIALAT LGALAANLLL LGLDRAVPAR PXGAAVACAA SFWLFFAFKT 

4 01 ESSCRLWQPL KRLPLYLHTL FCLTSSAAYT CFGTPANYPL FAGVWAAYLA 

4 51 GCILRHRKDL HKLFHYLKKQ GFPL* 

Further sequence analysis revealed the complete DNA sequence<SEQ ID 375> to be: 

1 AT GGACACAA AAGAAATCCT CGGCTACGCG GCAGGCTCGA TCGGCAGCGC 

51 GGTTTTAGCC GTCATCATCC TGCCGCTGCT GTCGTGGTAT TTCCCCGCCG 

101 ACGACATCGG GCGCATCGTG CTGATGCAGA CGGCGGCGGG GCTGACGGTG 
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151 TCGGTGTTGT GCCTCGGGCT GGATCAGGCA TACGTCCGCG AATACTATGC 

201 CACCGCCGAC AAAGACACCT TGTTCAAAAC CCTGTTCCTG CCGCCGCTGC 

251 TGTCTGCCGC CGCGATAGCC GCCCTGCTGC TTTCCCGCCC GTCCCTGCCG 

301 TCTGAAATCC TGTTTTCACT CGACGATGCC GCCGCCGGCA TCGGGCTGGT 

351 GCTGTTTGAA CTGAGCTTCC TGCCCATCCG CTTTCTCTTA CTGGTTTTGC 

4 01 GTATGGAAGG ACGCGCCCTT GCCTTTTCGT CCGCGCAACT CGTGCCCAAG 

4 51 CTCGCCATCC TGCTGCTGCT GCCGCTGACG GTCGGGCTGC TGCACTTTCC 

501 AGCGAACACC GCCGTCCTGA CCGCCGTTTA CGCGCTGGCA AACCTTGCCG 

551 CCGCCGCCTT TTTGCTGTTT CAAAACCGAT GCCGTCTGAA GGCCGTCCGG 

601 CACGCACCGT TTTCGCCCGC CGTCCTGCAC CGGGGGCTGC GCTACGGCAT 

651 ACCGAT CGCA CTGAGCAGCA TCGCCTATTG GGGGCTGGCA TCCGCCGACC 

701 GTTTGTTCCT GAAAAAATAT GCCGGCCTGG AACAGCTCGG CGTTTATTCG 

751 ATGGGTATTT CGTTCGGCGG GGCGGCATTA TTGTTCCAAA GCATCTTTTC 

801 AACGGTCTGG ACACCGTATA TTTTCCGCGC AATCGAAGAA AACGCCCCGC 

851 CCGCCCGCCT CTCGGCAACG GCAGAATCCG CCGCCGCCCT GCTTGCCTCC 

901 GCCCTCTGCC TGACCGGCAT TTTCTCGCCC CTTGCCTCCC TCCTGCTGCC 

951 GGAAAACTAC GCCGCCGTCC GGTTTATCGT CGTATCGTGT ATGCTGCCGC 

1001 CGCTGTTTTG CACGCTGGCG GAAAT C AGCG GCATCGGTTT GAACGTCGTC 

1051 CGCAAAACGC GCCCGATCGC GCTCGCCACC TTGGGCGCGC TGGCGGCAAA 

1101 CCTGCTGCTG CTGGGGCTTG CCGTGCCGTC CGGCGGCGCG CGCGGCGCGG 

1151 CGGTTGCCTG TGCCGCCTCA TTCTGGCTGT TTTTTGCCTT CAAGACCGAA 

1201 AGCTCCTGCC GCCTGTGGCA GCCGCTCAAA CGCCTGCCGC TTTATCTGCA 

1251 CACATTGTTC TGCCTGACCT CCTCGGCGGC CTACACCTGC TTCGGCACGC 

1301 CGGCAAACTA TCCCCTGTTT GCCGGCGTAT GGGCGGCATA TCTGGCAGGC 

1351 TGCATCCTGC GCCACCGGAA AGATTTGCAC AAACTGTTTC ATTATTTGAA 

1401 AAAACAAGGT TTCCCATTAT GA 

This corresponds to the amino acid sequence <SEQ ID 376; ORF10-1>: 

1 MDTKEILGYA AGSIGSAVLA VIILPLLSWY FPA DDIGRI V LMQTAAGLTV 

51 SVLCLGLDQA YVREYYATAD KDTLFKT LFL PPLLSAAAIA A LLLSRPSLP 

101 SEILFSLDDA AAGIG LVLFE LSFLPIRFLL LV LRMEGRAL AFSSAQL VPK 

151 LAILLLLPLT VGLL HFPANT AVLTAVYALA NLAAAAFL LF QNRCRLKAVR 

201 HAPFS PAVLH RGLRYGIPIA LSSIAYWGLA SADRLFLKKY AGLEQ LGVYS 

251 MGISFGGAAL LF QSIFSTVW TPYIFRAIEE NAPPARLSAT AE S AAALLAS 

301 ALCLTGIFSP L ASLLLPENY AAVRFIWSC MLPPLFCTLA EISGIGLNW 

351 RKTRP IALAT LGALAANLLL LG LAVPSGGA R GAAVACAAS FWLFFAFK TE 

401 SSCRLWQPLK RLPLYLHTLF CLTSSAAYTC FGTPANYPLF AGVWAAYLAG 

451 CILRHRKDLH KLFHYLKKQG FPL* 

Computer analysis of this amino acid sequence gave the following results: 
Prediction 

ORF10-1 is predicted to be the precursor of an integral membrane protein, since it comprises 
several (12-13) potential transmembrane segments, and a probable cleavable signal peptide 
Homology with EpsM from Streptococcus thermophilics (accession number U40830). 
ORF10 shows homology with the epsM gene of S. thermophilus, which encodes a protein of a size 
similar to ORF10 and is involved in expolysaccharide synthesis. Other homologies are with 
prokaryotic membrane proteins: 



Identities = (25%) 

Query: 213 LRYGIPLALSSLAYWGLASADRLFLKKYAGLEQLGVYSMGISFGGAALLLQSIFSTVW 270 

L Y +PL SS+ +W L ++ R F+ + G G+ ++ + +IF+ W 

Sbjct: 210 LYYALPLIPSSILWWLLNASSRYFVLFFLGAGANGLLAVATKIPSIISIFNTIFTQAW 267 



Identities = 15/57 (26%), Positives = 31/57 (54%) 

Query: 7 LGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQAYVR 63 

L + G++GS +L +++PL ++ + G L QT A L + ++ + + A +R 

Sbjct: 12 LVFTIGNLGSKLLVFLLVPLYTYAMTPQEYGMADLYQTTANLLLPLITMNVFDATLR 68 
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16/96 (16%), Positives = 36/96 (37%) 



Query 307 IFSPLASLLLPENYAAVRFTWSCMLPPLFYTLTEISGIGLNWRKTRPIXXXXXXXXXX 366 

+ p+ ++ +YA+ V ML LF + ++ G ++T+ + 

Sbjct: 305 VLKPIVEKWSSDYASSWQYVPFFMLSMLFSSFSDFFGTNYIAAKQTKGVFMTSIYGTIV 364 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF10 shows 95.4% identity over a 475aa overlap with an ORF (ORF 10a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orflO pep MDTKEILXYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 
I | | | | | | | | | | I I I I I I I I I I 1 I I I I I I I i I I I I I I I I I I I I I I M I I I I I I I I I I I I I 
orf 10a MDTKEILGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 10 pep YVREYYATADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 

! I | | I I I : I I I I I I I II I I I I I I I I I I I I I I I I M I I I I I I I I I I MINIMI 

orf 10a YVREYYAAADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 10 pep LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLXPLTVGLLHFPANTAVLTAVYALA 

M I I I I I II II I II II Mill I II M M I I M II II II II II II II II I M 

orf 10a LSFLPIRFLLLVLRMEGRALAFSSAQLVSKLAILLLLPLTVGLLHFPANTAVLTAVYALA 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 10 . pep NLAAAAFLLFQNRCRLKAVRHAPFSPAVLHRGXRYGIPIALSSIAYWGLASADRLFLKKY 
M I I I I I I I I I I I I M I I I I : M I I I I II II II M II II II I I I II I II I I I I I I I II 
orf 10a NLAAAAFLLFQNRCRLKAVRRAPFSSAVLHRGLRYGIPIALSSIAYWGLASADRLFLKKY 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 10 .pep AGLEQLGV YSMG I S FGGAALLFQS I FS TVWT PYI FRAIEENAP PARLS ATAE S AAALLAS 
M II II II II II II I II I I M II I 11 I I I I II I I I I I I I I I I I II I I M I II I I I I I I I 
orf 10a AGLEQLGVYSMG IS FGGAALLFQS I FS TVWT PYI FRAIEANAPPARLSATAESAAAL LAS 

250 260 270 280 290 300 



310 320 330 340 350 360 

orf 10. pep ALCXTGIFSPLASLLLPENYAAVRFIWSCMXPPLFCTLAEISGIGLNWRKTRPIALAT 
Ml I I I I I I I II II I I II II M M M M M II II II I : II I I M M M M M M M II 
orf 10a ALCLTGIFSPLASLLLPENYAAVRFIWSCMLPPLFCTLVEISGIGLNWRKTRPIALAT 

310 320 330 340 350 360 



370 380 390 400 410 419 

orf 1 0 . pep LGALAANLLLLGLDRAVPAR-PXGAAVACAASFWLFFAFKTESSCRLWQPLKRLPLYLHT 
M I I I I I I II II I III: M I II I I M I II II : \ II II II II I II II I I II I : II 

orf 10a LGALAANLLLLGL — AVPSGGARGAAVACAASFWLFFVFKTESSCRLWQPLKRLPLYMHT 

370 380 390 400 410 



420 430 440 450 460 470 

orf 10 .pep LFCLTSSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKDLHKLFHYLKKQGFPLX 
II I I : I I I I II I II II II I II I II I I I : I M II I II II II II II II M I I I I I I 1 I 

orf 10a LFCLASSAAYTCFGTPANYPLFAGVWAVYLAGCILRHRKDLHKLFHYLKKQGFPLX 
420 430 440 450 460 470 

The complete length ORF 10a nucleotide sequence <SEQ ID 377> is: 



1 ATGGACACAA AAGAAATCCT CGGCTACGCG GCAGGCTCGA TCGGCAGCGC 

51 GGTTTTAGCC GTCATCATCC TGCCGCTGCT GTCGTGGTAT TTCCCTGCCG 

101 ACGACATCGG ACGCATCGTG CTGATGCAGA CGGCGGCGGG GCTGACGGTG 

151 TCGGTGTTGT GCCTCGGGCT GGATCAGGCA TACGTCCGCG AATACTATGC 

2 01 CGCCGCCGAC AAAGACACTT TGTTCAAAAC CCTGTTCCTG CCGCCGCTGC 

251 TGTCTGCCGC CGCGATAGCC GCCCTGCTGC TTTCCCGCCC ATCCCTGCCG 
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301 TCTGAAATCC TGTTTTCGCT CGACGATGCC GCCGCCGGCA TCGGGCTGGT 

351 GCTGTTTGAA CTGAGCTTCC TGCCCATCCG CTTTCTCTTA CTGGTTTTGC 

4 01 GT AT GGAAGG ACGCGCCCTT GCCTTTTCGT CCGCGCAACT CGTGTCCAAG 

4 51 CTCGCCATCC TGCTGCTGCT GCCGCTGACG GTCGGGCTGC TGCACTTTCC 

5 501 GGCGAACACC GCCGTCCTGA CCGCCGTTTA CGCGCTGGCA AACCTTGCCG 

551 CCGCCGCCTT TTTGCTGTTT CAAAACCGAT GCCGTCTGAA GGCCGTCCGG 

601 CGCGCACCGT TTTCATCCGC CGTCCTGCAT CGCGGCCTGC GCTACGGCAT 

651 ACCGATCGCA CTAAGCAGCA TCGCCTATTG GGGGCTGGCA TCCGCCGACC 

7 01 GTTTGTTCCT GAAAAAATAT GCCGGCCTAG AACAGCTCGG CGTTTATTCG 

10 75i ATGGGTATTT CGTTCGGCGG AGCGGCATTA TTGTTCCAAA GCATCTTTTC 

801 AACGGT CTGG ACACCGTATA TTTTCCGCGC AAT CGAAGCA AACGCCCCGC 

851 CCGCCCGCCT CTCGGCAACG GCAGAATCCG CCGCCGCCCT GCTTGCCTCC 

901 GCCCTCTGCC TGACCGGCAT TTTCTCGCCC CTCGCCTCCC TCCTGCTGCC 

951 GGAAAACTAC GCCGCCGTCC GGTTTATCGT CGTATCGTGT ATGCTGCCTC 

15 1001 CGCTGTTTTG CACGCTGGTA GAAATCAGCG GCATCGGTTT GAACGTCGTC 

1051 CGAAAAACAC GCCCGATCGC GCTCGCCACC TTGGGCGCGC TGGCGGCAAA 

1101 CCTGCTGCTG CTGGGGCTTG CCGTACCGTC CGGCGGCGCG CGCGGCGCGG 

1151 CGGTTGCCTG TGCCGCCTCA TTTTGGCTGT TTTTTGTTTT CAAGACCGAA 

1201 AGCTCCTGCC GCCTGTGGCA GCCGCTCAAA CGCCTGCCGC TTTATATGCA 

20 1251 CACATTGTTC TGCCTGGCCT CCTCGGCGGC CTACACCTGC TTCGGCACTC 

1301 CGGCAAACTA CCCCCTGTTT GC CGGCGT AT GGGCGGTATA TCTGGCAGGC 

1351 TGCATCCTGC GCCACCGGAA AGATTTGCAC AAACTGTTTC ATTATTTGAA 

14 01 AAAACAAGGT TTCCCATTAT GA 

This encodes a protein having amino acid sequence <SEQ ID 378>: 

25 1 MDTKEILGYA AGSIGSAVLA VIILPLLSWY FPADDIGRIV LMQTAAGLTV 

51 SVLCLGLDQA YVREYYAAAD KDTLFKTLFL PPLLSAAAIA ALLLSRPSLP 

101 SEILFSLDDA AAGIGLVLFE LSFLPIRFLL LVLRMEGRAL AFSSAQLVSK 

151 LAILLLLPLT VGLLHFPANT AVLTAVYALA NLAAAAFLLF QNRCRLKAVR 

201 RAPFSSAVLH RGLRYGIPIA LSSIAYWGLA SADRLFLKKY AGLEQLGVYS 

30 251 MGISFGGAAL LFQSIFSTVW TPYIFRAIEA NAPPARLSAT AESAAALLAS 

301 ALCLTGIFSP LASLLLPENY AAVRFIWSC MLPPLFCTLV EISGIGLNW 

351 RKTRPIALAT LGALAANLLL LGLAVPSGGA RGAAVACAAS FWLFFVFKTE 

401 SSCRLWQPLK RLPLYMHTLF CLASSAAYTC FGTPANYPLF AGVWAVYLAG 

4 51 CILRHRKDLH KLFHYLKKQG FPL* 

3 5 ORF 1 0a and ORF 1 0- 1 show 95 .4% identity in 475 aa overlap : 

10 20 30 40 50 60 

or f 1 0 - 1 . pep MDTKE I LXYAAG S I GSAVLAVI I LPLLSWYFPADD IGRIVLMQTAAGLTVSVLCLGLDQA 
I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
or f 10a MDTKE I LGYAAG SI GSAVLAVI ILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 

40 10 20 30 40 50 60 

70 80 90 100 110 120 

or f 10-1. pep YVREYYATADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 
I I I I [ I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 M I I I I I I I I i I I I I I I 
45 or f 10a YVREYYAAADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 

70 80 90 100 110 120 

130 140 150 160 170 180 

or f 10-1. pep LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLXPLTVGLLHFPANTAVLTAVYALA 
50 1 I 1 I I I I II I 1 I I I I M I I I I I I I I I I I I I I I I I I I I I I II M 11 I I I I I I I I I I M I 

or f 10a LSFLPIRFLLLVLRMEGRALAFSSAQLVSKLAILLLLPLTVGLLHFPANTAVLTAVYALA 
130 140 150 160 170 180 

190 200 210 220 230 240 

55 orf 10-1 .pep NLAAAAFLLFQNRCRLKAVRHAPFSPAVLHRGXRYGIPIALSSIAYWGLASADRLFLKKY 

I I I I I I I I I II I I I I I I I I I : I I I I MINI I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 10a NLAAAAFLLFQNRCRLKAVRRAPFSSAVLHRGLRYGIPIALSSIAYWGLASADRLFLKKY 

190 200 210 220 230 240 

60 250 260 270 280 290 300 

or f 1 0 - 1 . pep AGLEQLGVY SMGI S FGGAALLFQS I FSTVWTPYI FRAIEENAP PARL SATAE S AAALLAS 

II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I 
orf 10a AGLEQLGVY SMGI S FGGAALLFQS I FSTVWTPYI FRAIEANAPPARLSATAESAAALLAS 

250 260 270 280 290 300 

65 

310 320 330 340 350 360 
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orf 10-1. pep 



orf 10-1. pep 



ALCXTGIFSPLASLLLPENYAAVRFIWSCMXPPLFCTLAEISGIGLNWRKTRPIALAT 

III | | || | | | | | I M I I I I I I I I 1 I I I I I 1 I I 1 I 1 I I : I 1 I 1 I I I 1 I I I I I I I 1 I I I I 
ALCLTGIFSPLASLLLPENYAAVRFIWSCMLPPLFCTLVEISGIGLNWRKTRPIALAT 
310 320 330 340 350 360 

370 380 390 400 410 419 

LGALAANLLLLGLDRAVPAR-PXGAAVACAASFWLFFAFKTESSCRLWQPLKRLPLYLHT 

MM! I I I Ml: ! I I I 1 I I I I I I I I I : I I I I I I 1 I I M I 1 I I I N I : I I 

-AVPSGGARGAAVACAASFWLFFVFKTESSCRLWQPLKRLPLYMHT 
380 390 400 410 

420 430 440 450 460 470 

LFCLTSSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKDLHKLFHYLKKQGFPLX 
I I I I = I I M M I I I I M M I I M I I I I : I I I I I M M I I I I I I I I I I I I I I I I I I I 
LFCLASSAAYTCFGTPANYPLFAGVWAVYLAGCILRHRKDLHKLFHYLKKQGFPLX 
420 430 440 450 460 470 



Homology with a predicted ORF from N. gonorrhoeae 

ORF 10 shows 94.1% identity over a 475aa overlap with a predicted ORF (ORFlO.ng) from N. 



20 gonorrhoeae: 



MDTKEILGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 



45 



orflOnq pep YVREYYAAADKDTLFKTLFLPPLLFSAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 120 

1:111 I I I I I I I : I II M I I I I I I I I I II I I II I I I I 

orflOnm YVREYYATADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 120 

orf 10nq . pep LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLLPLTVGLLHFPANTSVLTAVYALA 180 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I : I I I I I I I I I 

orflOnm LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLXPLTVGLLHFPANTAVLTAVYALA 180 

orflOng pep NLAAAAFLLFQNRCRLKAVRRAPFSPAVLHRGLRYGIPLALSSLAYWGLASADRLFLKKY 240 

I | | | | I I I I I I I II I I I I II : I I I I I I I I I M I I I I I : I I I I : I M I I I I I I I I II I I I 

orflOnm NLAAAAFLLFQNRCRLKAVRHAPFSPAVLHRGXRYGIPIALSSIAYWGLASADRLFLKKY 24 0 

orflOng pep AGLEQLGVYSMGISFGGAALLLQSIFSTVWTPYIFRAIEENATPARLSATAESAAALLAS 3 00 

M I I I I I I I I I I I I I I I I I I I : I I I II I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I 

orflOnm AGLEQLGVYSMGI S FGGAALLFQS I FS TVWT PYI FRAIEENAP PARLSATAE SAAALLAS 300 

orflOng. pep ALCLTGIFSPLASLLLPENYAAVRFTWSCMLPPLFYTLTEISGIGLNWRKTRPIALAT 360 

III I I I I I II I I I I I I I I I I I I I I I II I I I I II I I : I I II I II I I I I I I I I I I I I I 

orflOnm ALCXTGIFSPLASLLLPENYAAVRFIWSCMXPPLFCTLAEISGIGLNWRKTRPIALAT 3 60 



orflOng. pep 



370 

LGALAANLLLLGL — 



380 390 400 410 

-AVPSGGTRGAAVACAASFWLFFVFKTESSCRLWQPLKRLPLYMHT 
I I I I I I I I I I I I I III: I I I I I I I I M I I I I : I I I I I I I I ! I I I I I I I I II : I I 

LGALAANLLLLGLDRAVPAR-PXGAAVACAASFWLFFAFKTESSCRLWQPLKRLPLYLHT 
370 380 390 400 410 



420 430 440 450 460 470 

orflOng. pep LFCLASSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKNLHKLFHYLKKQGFPLX 
I II I : I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I : I M I I I I I I I I M M I 
orflOnm LFCLTSSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKDLHKLFHYLKKQGFPLX 
420 430 440 450 460 470 

The complete length ORFlOng nucleotide sequence <SEQ ID 379> is: 



101 
151 
201 
251 
301 
351 



AT GGACACAA 
GGTTTTAGCC 
ACGACATCGG 
TCGGTATTGT 
CGCCGCCGAC 
TGTTTTCCGC 
TCTGAAATCC 
GCTGTTTGAA 



AAGAAATCCT 
GTCATCATCC 
GCGCATCGTG 
GCCTCGGGCT 
AAAGACACTT 
CGCGATAGCC 
TGTTTTCGCT 
CTGAGCTTCC 



CGGCTACGCG 
TGCCGCTGCT 
CTGATGCAGA 
GGATCAGGCA 
TGTTCAAAAC 
GCCCTGCTGC 
CGACGATGCC 
TGCCCATCCG 



GCAGGCTCGA 
GTCGTGGTAT 
CGGCGGCGGG 
TACGTCCGCG 
CCTGTTCCTG 
TTTCCCGCCC 
GCCGCCGGCA 
CTTTCTCTTA 



TCGGCAGCGC 

TTCcccgCCG 
ACTGACGGTG 
AAT ACT AT GC 
CCGCCGCTGC 
GTCCCTGCCG 
TCGGGCTGGT 
CTGGTTTTGC 
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401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 



GTATGGAAGG 
CTCGCCATTC 
GGCGAACACC 
CCGCCGCCTT 
CGCGCGCCGT 
ACCGCTCGCA 
GTTTGTTCCT 
ATGGGTATTT 
AACGGTCTGG 
CCGCCCGCCT 
GCCCTCTGCC 
GGAAAAC T AC 
cgctGTTTTA 
CGCAAAACGC 
CCTGCTGCTG 
CGGTTGCCTG 
AGCTCCTGCC 
CACATTGTTC 
CGGCAAACTA 
TGCATCCTGC 
AAAACAAGGT 



GCGCGCCCTT 
TGCTGCTGTT 
TCCGTCCTGA 
TTTGCTGTTT 
TTTCGCCCGC 
CTGAGCAGCC 
GAAAAAATAT 
CGTTCGGCGG 
ACACCGTATA 
CTCGGCAACG 
TGACCGGAAT 
GCCGCCGTCC 
CACGCTGACC 
GTCCGATCGC 
CTGGGGCTTG 
TGCCGCCTCA 
GCCTGTGGCA 
TGCCTgGCCT 
CCCcctgttt 
GCCACCGGAA 
TTCCCATTAT 



GCCTTTTCGT 
GCCGCTGACG 
CCGCCGTTTA 
CAAAACCGAT 
CGTCCTGCAC 
TTGCCTATTG 
GCGGGCCTGG 
GGCGGCATTA 
TTTTCCGTGC 
GCAGAATCCG 
TTTCTCGCCC 
GGTTTACCGT 
GAAATCAGCG 
GCTTGCCACC 
CCGTACCGTC 
TTCTGGTTGT 
GCCGCTCAAA 
CCTCGGCGGC 
gccggcgtAT 
AAATTTGCAC 
GA 



CCGCGCAACT 
GTCGGGCTGC 
CGCGCTGGCA 
GCCGTCTGAA 
CGGGGGCTGC 
GGGGCTGGCA 
AACAGCTCGG 
TTGCTCCAAA 
AAT CGAAGAA 
CCGCCGCCCT 
CTCGCCTCCC 
CGTATCGTGT 
GCATCGGTTT 
TTGGGCGCGC 
CGGCGGCACG 
TTTTTGTTTT 
CGCCTGCCGC 
CTACACCTGC 
GGGCGGCATA 
AAACTGTTTC 



CGTGCCCAAA 
TGCACTTTCC 
AACCTTGCCG 
GGCCGTCCGG 
GCTACGGCAT 
TCCGCCGACC 
CGTTTATTCG 
GCATCTTTTC 
AACGCCACGC 
GCTTGCCTCC 
TCCTGCTGCC 
ATGCTGccgc 
GAACGTCGTC 
TGGCGGCAAA 
CGCGGCGCGG 
CAAGACAGAA 
TTTATATGCA 
TTCGGCACAC 
TCTGGCAGGC 
ATTATTTGAA 



This encodes a protein having amino acid sequence <SEQ ID 380>: 



1 MDTKEILGYA AGSIGSAVLA VIILPLLSWY ] 

51 SVLCLGLDQA YVREYYAAAD KDTLFKTL FL j 

101 SEILFSLDDA 

151 LAILLLLPLT 

201 RAPFSPAVLH 

251 MGISFGGAAL 

301 ALCLTGIFSP 

351 RKTRPI ALAT 

401 SSCRLWQPLK 

451 CILRHRKNLH 



VGLLHFPANT 
RGLRYGIPLA 
LLQSIFSTVW 
LASLLLPENY 
LGALAANLLL 



SVLTAVYALA 
LSSLAYWGLA 
TPYIFRAIEE 
AAVRFTWSC 
LGLAV PSGGT 
CLAS SAAYTC 
FPL* 



L VLKME GRAL 

NLAAAAFLLF 
SADRLFLKKY 
NAT PARL SAT 
MLPPLFYTLT 
RGAAVACAAS 
FGTPANYPLF 



LMQTAAGLTV 
ALLL SRPSLP 
AFSSAQLVPK 

QNRCRLKAVR 
AGLEQLGVYS 
AESAAALLAS 
EISGIGLNW 
FWLFFVFKTE 
AGVWAAYLAG 



ORFlOng and ORF10-1 show 96.4% identity in 473 aa overlap: 
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MDTKEILGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 

1 I f M I I I I I I II II I N I I I i I [ I I I i I I I M M II I 1 I II II II II I II M I I I II II 

MDTKEILGYAAGSIGSAVLAVIILPLLSWYFPADDIGRIVLMQTAAGLTVSVLCLGLDQA 



10 



70 



20 



80 



30 



40 



100 



50 



60 



110 120 

orf 10-1. pep YVREYYATADKDTLFKTLFLPPLLSAAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 
I M I I I I : I I I II I I I I I I I I 1 I I : I | I I ] || | | | | | | | | | | | | | | | | | | M [ | I] | | | 
orfl0ng-l YVREYYAAADKDTLFKTLFLPPLLFSAAIAALLLSRPSLPSEILFSLDDAAAGIGLVLFE 
70 80 90 100 110 120 

130 140 150 160 170 180 

orf 10-1 . pep LSFLPIRFLLLVLRMEGRALAFSSAQLVPKLAILLLLPLTVGLLHFPANTAVLTAVYALA 
I I N I I I I I I I I II II I I I I I || I | | || | 1 | | | | | | | | || | | | | | | || || : || | | | | | || 
orfl0ng-l LS FLPIRFLLLVLRMEGRALAFS SAQLVPKLAI LLLLPLTVGLLHFPANT SVLTAVYALA 

130 140 150 160 170 180 

190 200 210 220 230 240 

orflO-l.pep NLAAAAFLLFQNRCRLKAVRHAP FS PAVLHRGLRYGI P IALSS I AYWGLAS ADRL FLKKY 
M N II I I M M I I I I I I M : I I I M I I I I 1 I II I I I I : I I I I : I II I I I I I I I I I I I I I 
orfl0ng-l NLAAAAFLLFQNRCRLKAVRRAPFS PAVLHRGLRYGI PLALSS LAYWGLASADRLFLKKY 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 10-1. pep AGLEQLGVYSMGISFGGAALLFQSIFSTVWTPYIFRAIEENAPPARLSATAESAAALLAS 

I N I II I I I I I I I : | | || || | | | | | || | | | | | | I I I I I I I I M I I I 

orfl0ng-l AGLEQLGVYSMGISFGGAALLLQSIFSTVWTPYIFRAIEENATPARLSATAESAAALLAS 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 10-1 . pep ALCLTGIFSPLASLLLPENYAAVRFIWSCMLPPLFCTLAEISGIGLNWRKTRPIALAT 
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orflOng-l ALCLTGIFSPLASLLLPENYAAVRFTWSCMLPPLFYTLTEISGIGLNWRKTRPIALAT 
310 320 330 340 350 360 



370 380 390 400 410 420 

orf 10-1 . pep LGALAANLLLLGLAVPSGGARGAAVACAASFWLFFAFKTESSCRLWQPLKRLPLYLHTLF 
I ! I ! I 1 I I I I I I I I I I I I I : I I I M I I I I I I I I I I : I I I I I I I I I I I I I I M I I I : I I I I 
orflOng-l LGALAANLLLLGLAVPSGGTRGAAVACAASFWLFFVFKTESSCRLWQPLKRLPLYMHTLF 
370 380 390 400 410 420 



430 440 450 460 470 

orf 10-1 . pep CLTSSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKDLHKLFHYLKKQGFPLX 
II : 1 I I I I II II I I I I I I I I I ! I I i I M I I I I I I I N : M M M I I M I II I I I 
orflOng-l CLASSAAYTCFGTPANYPLFAGVWAAYLAGCILRHRKNLHKLFHYLKKQGFPLX 

430 440 450 460 470 



Based on this analysis, including the presence of a putative leader peptide and several 
transmembrane segments and the presence of a leucine-zipper motif (4 Leu residues spaced by 6 
aa, shown in bold), it is predicted that these proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 45 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 38 1>: 

1 . . ATCCTGAAAC CGCATAACCA GCTTAAGGAA GACATCCAAC CTGATCCGGC 

51 CGATCAAAAC GCCTTGTCCG AACCGGATGC TGCGACAGAG GCAGAGCAGT 

101 CGGATGCGGA AAATGCTGCC GACAAGCAGC CCGTTGCCGA TAAAGCCGAC 

151 GAGGTTGAAG AAAAGGCGGG CGAGCCGGAA CGGGAAGAGC CGGACGGACA 

2 01 GGCAGTGCGT AAGAAAGCGC TGACGGAAGA GCGTGAACAA ACCGTCAGGG 

251 AAAAAGCGCA GAAGAAAGAT GCCGAAACGG TTAAAATACA AG CGGT AAAA 

301 CCGTCTAAAG AAACAGAGAA AAAAGCTTCA AAAGAAGAGA AAAAGGCGGC 

351 GAAGGAAAAA GTTGCACCCA AACCAACCCC GGAACAAATC CTCAACAGCG 

4 01 GCAgCATCGA AAAitiGCGCGC AgTGCCGCCG CCAAAGAAGT GCAGAAAATG 

4 51 AA.AACGTCC GACAAGGCGG AAGC.AACGC ATTATCTGCA AATGGGCGCG 

501 TATGCCGACC GTCAGAGCGC GGAAGGGCAG CGTGCCAAAC TGGCAATCTT 

551 GGGCATATCT TCCAAGGTGG TCGGTTATCA GGCGGGACAT AAAACGCTTT 

601 ACCGGGTGCA AAGCGGCAAT ATGTCTGCCG ATGCGGTGA 

This corresponds to the amino acid sequence <SEQ ID 382; ORF65>: 

1 . . ILKPHNQLKE DIQPDPADQN ALSEPDAATE AEQSDAENAA DKQPVADKAD 
51 EVEEKAGEPE REE PDGQAVR KKALTEEREQ TVREKAQKKD AETVKIQAVK 
101 PSKETEKKAS KEEKKAAKEK VAPKPTPEQI LNSGSIEXAR SAAAKEVQKM 
151 XNVRQGGSXR IICKWARMPT VRARKGSVPN WQSWAYLPRW SVIRRDIKRF 
201 TGCKAAICLP MR* 

Further work revealed the complete nucleotide sequence <SEQ ID 383>: 

1 ATGTTTATGA ACAAATTTTC CCAATCCGGA AAAGGTCTGT CCGGTTTTTT 

51 CTTCGGTTTG ATACTGGCGA CGGTCATTAT TGCCGGTATT TTGTTTTATC 

101 TGAACCAGAG CGGTCAAAAT GCGTTCAAAA TCCCGGCTTC GTCGAAGCAG 

151 CCTGCAGAAA CGGAAATCCT GAAACCGAAA AACCAGCCTA AGGAAGACAT 

201 CCAACCTGAA CCGGCCGATC AAAACGCCTT GTCCGAACCG GATGCTGCGA 

251 CAGAGGCAGA GCAGTCGGAT GCGGAAAAAG CTGCCGACAA GCAGCCCGTT 

301 GCCGATAAAG CCGACGAGGT TGAAGAAAAG GCGGGCGAGC CGGAACGGGA 

351 AGAGCCGGAC GGACAGGCAG TGCGTAAGAA AGCGCTGACG GAAGAGCGTG 

401 AACAAACCGT CAGGGAAAAA GCGCAGAAGA AAGATGCCGA AACGGTTAAA 

451 AAACAAGCGG TAAAACCGTC TAAAGAAACA GAGAAAAAAG CTTCAAAAGA 

501 A GAGAAAAAG GCGGCGAAGG AAAAAGTTGC ACCCAAACCA ACCCCGGAAC 

551 AAATCCTCAA CAGCGGCAGC AT CGAAAAAG CGCGCAGTGC CGCCGCCAAA 

601 GAAGTGCAGA AAATGAAAAC GTCCGACAAG GCGGAAGCAA CGCATTATCT 

651 GCAAATGGGC GCGTATGCCG ACCGTCAGAG CGCGGAAGGG CAGCGTGCCA 
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701 AACTGGCAAT CTTGGGCATA TCTTCCAAGG TGGTCGGTTA TCAGGCGGGA 

751 CATAAAACGC TTTACCGGGT GCAAAGCGGC AATATGTCTG CCGATGCGGT 

801 G7A7AAAAAATG CAGGACGAGT TGAAAAAACA TGAAGTCGCC AGCCTGATCC 

851 GTTCTATCGA AAGCAAATAA 

This corresponds to the amino acid sequence <SEQ ID 384; ORF65-l>: 



1 MFMNKFSQSG KGLSG FFFGL ILATVIIAGI LF YLNQSGQN AFKIPASSKQ 

51 PAETEILKPK NQPKEDIQPE PADQNALSEP DAATEAEQSD AEKAADKQPV 

101 ADKADEVEEK AGEPEREEPD GQAVRKKALT EEREQTVREK AQKKDAETVK 

151 KQAVKPSKET EKKASKEEKK AAKEKVAPKP TPEQILNSGS IEKARSAAAK 

201 EVQKMKTSDK AEATHYLQMG AYADRQSAEG QRAKLAILGI SSKWGYQAG 

251 HKTLYRVQSG NMSADAVKKM QDELKKHEVA SLIRSIESK* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted QRF from N. meningitidis (strain A) 

ORF65 shows 92.0% identity over a 150aa overlap with an ORF (ORF65a) from strain A of TV. 
meningitidis: 



10 20 30 

orf 65 . pep ILKPHNQLKEDIQPDPADQNALSEPDAATE 

I I I I : I I I I I I I I : I M I I M M M M I 
orf 65a IIAGILF YLNQSGQHAFKIPVPSKQPAETEILKPKNQPKEDIQPEPADONALSEPDAAKE 
30 40 50 60 70 80 



40 50 60 70 80 90 

ASQSDAENAADKQPVADKADEVEEKAGEPEREEPDGQAVRKKALTEEREQTVREKAQKKD 
I I I I I I I = I i I I I I I I I I I I I I I I I I Mill: I I I I I II I I I I I I I I I I I I I I I I I I 
AEQSDAEKAADKQPVADKADEVEEKADEPEREKSDGQAVRKKALTEEREQTVGEKAQKKD 
90 100 110 120 130 140 

100 110 120 130 140 150 

AETVKIQAVKPSKETEKKASKEEKKAAKEKVAPKPTPEQILNSGSIEXARSAAAKEVQKM 
I I I I I 1 I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I I I I || | | I | 
AETVKKQAVKPSKETEKKASKEEKKAEKEKVAPKPTPEQILNSGSIEKARSAAAKEVQKM 
150 160 170 180 190 200 



160 170 180 190 200 210 

orf 65 . pep XNVRQGGSXRIICKWARMPTVRARKGSVPNWQSWAYLPRWSVIRRDIKRFTGCKAAICLP 

orf 65a KTPDKAEATHYLQMGAYADRRSAEGQRAKLAILGISSICWGYQAGHKTLYRVQSGNMSAD 
210 220 230 240 250 260 

The complete length ORF65a nucleotide sequence <SEQ ID 385> is: 

1 ATGTTTATGA ACAAATTTTC CCAATCCGGA AAAGGTCTGT CCGGTTTTTT 

51 CTTCGGTTTG ATACTGGCGA CGGTCATTAT TGCCGGTATT TTGTTTTATC 

101 T G AAC C AG AG CGGTCAAAAT GCGTTCAAAA TCCCGGTTCC GTCGAAGCAG 

151 CCTGCAGAAA CGGAAATCCT GAAACCGAAA AACCAGCCTA AGGAAGACAT 

2 01 CCAACCTGAA CCGGCCGATC AAAACGCCTT GTCCGAACCG GATGCTGCGA 
251 AAGAGGCAGA GCAGTCGGAT GCGGAAAAAG CTGCCGACAA GCAGCCCGTT 

3 01 GCCGACAAAG CCGACGAGGT TGAGGAAAAG GCGGACGAGC CGGAGCGGGA 
351 AAAGTCGGAC GGACAGGCAG TGCGCAAGAA AGCACTGACG GAAGAGCGTG 

4 01 AACAAACCGT CGGGGAAAAA GCGCAGAAGA AAGATGCCGA AACGGTTAAA 
4 51 AAACAAGCGG TAAAACCATC TAAAGAAACA GAGAAAAAAG CTTCAAAAGA 
501 AGAGAAAAAG GCGGAGAAGG AAAAAGTTGC ACCCAAACCG ACCCCGGAAC 
551 AAATCCTCAA CAGCGGCAGC AT C G AAAAAG CGCGCAGTGC CGCTGCCAAA 
601 GAAGTGCAGA AAATGAAAAC GCCCGACAAG GCGGAAGCAA CGCATTATCT 
651 GCAAATGGGC GCGTATGCCG ACCGCCGGAG CGCGGAAGGG CAGCGTGCCA 
701 AACTGGCAAT CTTGGGCATA TCTTCCAAGG TGGTCGGTTA TCAGGCGGGA 
751 CATAAAACGC TTTACCGGGT GCAAAGCGGC AATATGTCTG CCGATGCGGT 
801 GAAAAAAATG CAGGACGAGT TGAAAAAACA TGAAGTCGCC AGCCTGATCC 
851 GTTCTATCGA AAGCAAATAA 



This encodes a protein having amino acid sequence <SEQ ID 386>: 
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1 MFMNKFSQSG KGLSG FFFGL ILATVIIAGI LF YLNQSGQN AFKIPVPSKQ 

51 PAETEILKPK NQPKEDIQPE PADQNALSEP DAAKEAEQSD AEKAADKQPV 

101 ADKADEVEEK ADEPEREKSD GQAVRKKALT EEREQTVGEK AQKKDAETVK 

151 KQAVKPSKET EKKASKEEKK AEKEKVAPKP TPEQILNSGS IEKARSAAAK 

201 EVQKMKT PDK AEATHYLQMG AYADRRSAEG QRAKLAI LG I SSKWGYQAG 

251 HKTLYRVQSG NMSADAVKKM QDELKKHEVA SLIRSIESK* 

ORF65a and ORF65-1 show 96.5% identity in 289 aa overlap: 

10 20 30 40 50 60 

orf 65a . pep MFMNKFSQSGKGLSGFFFGLILATVIIAGILFYLNQSGQNAFKIPVPSKQPAETEILKPK 
I I I I I I I I I I I I I I I I I I I I I I I M I 11 I I I I I I I I I I I M I I I I = I I I ( I I I I I I I i I 
orf 65-1 MFMNKFSQS GKGLS GFFFGLILATVI I AGILFYLNQS GQNAFKI PAS SKQPAETE ILKPK 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 65a . pep nqpkediqpepadqnalsepdaakeaeqsdaekaadkqpvadkadeveekadepereksd 

I I I I I I I I i I I I I I I I I I I I I I I I I I ! I I I I I I ! II I I I I I I I I I I I I I I Mill: I 
orf 65-1 nqpkediqpepadqnalsepdaateaeqsdaekaadkqpvadkadeveekagepereepd 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 65a . pep GQAVRKKALTEEREQTVGEKAQKKDAETVKKQAVKPSKETEKKASKEEKKAEKEKVAPKP 

II I I I M M M I I M M M II M N M I II M I I II I I I I I I I I I I I I I I Mil!! 

orf 65-1 GQAVRKKALTEEREQTVREKAQKKDAETVKKQAVKPSKETEKKASKEEKKAAKEKVAPKP 
130 140 150 160 170 180 

190 200 210 220 230 240 

orf 65a . pep TPEQILNSGS IEKARSAAAKEVQKMKTPDKAEATHYLQMGAYADRRSAEGQRAKLAILGI 
I I I I I I I M M II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I | I I | 
orf 65-1 TPEQILNSGS 1EKARSAAAKEVQKMKTSDPCAEATHYLQMGAYADRQSAEGQRAKLAILGI 

190 200 210 220 230 240 

250 260 270 280 290 

orf 65a . pep SSKWGYQAGHKTLYRVQSGNMSADAVKKMQDELKKHEVASLIRSIESKX 
M I II I I I I I I I I I I I I I I I I I I II II II I I 1 I I I I I I I I I I I I | I | | | | 
orf65-l S SKWGYQAGHKT LYRVQS GNMS ADAVKKMQDELKKHE VAS L IRSIESKX 

250 260 270 280 290 

Homology with a predicted ORF from N. gonorrhoeae 

ORF65 shows 89.6% identity over a 212aa overlap with a predicted ORF (ORF65.ng) from K 
gonorrhoeae: 

30 40 50 60 70 80 

ORF65ng I IAGILLYLNQGGQNAFKI PAP SKQPAETE ILKLKNQPKEDIQPEPADQNALSEPDVAKE 

Ml : II M I I I I : I I I I II I II II : I I 
ORF65 I LKPHNQLKE D I Q P D PAD QN AL SEP D AAT E 

10 20 30 

90 100 110 120 130 140 

ORF65ng AEQSDAEKAADKQPVADKADEVEEKAGEPEREEPDGQAVRKKALTEEREQTVREKAQKKD 

I M I M I : I I I I I I I I M II I M II I I I II I I I I I I I I I 1 I I I I I [ | [ | [ | || [ | | M | | 
ORF65 AEQSDAENAADKQPVADKADEVEEKAGEPEREEPDGQAVRKKALTEEREQTVREKAQKKD 
40 50 60 70 80 90 

150 160 170 180 190 200 

ORF65ng AETVKKKAVKPSKETEKKASKEEKKAAKEKVAPKPTPEQILNSRSIEKARSAAAKEVQKM 

I I I I I : I I I I I I I I M I I I I I I I I I I I I I I I I I I I M II I I I III I I I I I I I I I I I I 
ORF65 AETVKIQAVKPSKETEKPCASKEEKKAAKEKVAPKPTPEQILNSGSIEXARSAAAKEVQKM 
100 110 120 130 140 150 

210 220 230 240 250 260 

ORF65ng KN FGQGGS QRI I CKWARMPN PGARKG S VPNWQSWAYL PKWS AI RRDI KRFTACKAAI C PP 

I MM I M I i M I II : I M II I II M II II II M I : j I II II I I I : I I I II I I 
ORF65 XNVRQGGSXRIICKWARMPTVRARKGSVPNWQSWAYLPRWSVIRRDIKRFTGCKAAICLP 
160 170 180 190 200 210 



CHIR-0160 (356.001) 



-261- 



PATENT 



ORF65ng MR 
I I 

ORF65 MR 

An OKF65ng nucleotide sequence <SEQ ID 387> was predicted to encode a protein having amino 
5 acid sequence <SEQ ID 388>: 

1 MFMNKFSQSG K GLSGFFFGL ILATVIIAGI LLYLNQGGQN AFKIPAPSKQ 

51 PAETEILKLK NQPKEDIQPE PADQNALSEP DVAKEAEQSD AEKAADKQPV 

101 ADKADEVEEK AGEPEREEPD GQAVRKKALT EEREQTVREK AQKKDAETVK 

151 KKAVKPSKET EKKASKEEKK AAKEKVAPKP TPEQILNSRS IEKARSAAAK 

10 201 EVQKMKNFGQ GGSQRIICKW ARMPNPGARK GSVPNWQSWA YLPKWSAIRR 

251 DIKRFTACKA AICPPMR* 

After further analysis, the complete gonococcal DNA sequence <SEQ ID 389> was found to be: 

1 ATGTTTATGA ACAAATTTTC CCAATCCGGA AAAGGTCTGT CCGGTTTCTT 

51 CTTCGGTTTG ATACTGGCAA CGGTCATTAT TGCCGGTATT TTGCTTTATC 

15 101 TGAACCAGGG CGGT CAAAAT GCGTTCAAAA TCCCGGCTCC GTCGAAGCAG 

151 CCTGCAGAAA CGGAAATCCT GAAACTGAAA AACCAGCCTA AGGAAGACAT 

201 CCAACCTGAA CCGGCCGATC AAAACGCCTT GTCCGAACCG GATGTTGCGA 

251 AAGAGGCAGA GCAGTCGGAT GCGGAAAAAG CTGCCGACAA GCAGCCCGTT 

301 GCCGACAAag ccgacgAGGT TGAAGAAAag GcGGgcgAgc cggaACGGga 

20 351 aGAGCCGGAC ggACAGGCAG TGCGCAAGAA AGCACTGAcg gAAGAgcGTG 

401 AACAAACcgt cagggAAAAA GCGCagaaga AAGATGCCGA AACGgTTAAA 

451 AAacaaGCgg tAaaaccgtc tAAAGAAACa gagaaaaaag cTtcaaaaga 

501 agagaaaaag gcggcgaaag aaaAAGttgc acccaaaccg accccggaaC 

551 aaatcctcaa cagccgCagc atcgaaaaag cgcgtagtgc cgctgccaaa 

25 601 gaAgtgcaGA AAatgaaaaa ctTtgggcaa ggcgGaagcc aacgcattaT 

651 CTGcaaatgg gcgcgtatgc cgaccgtccg gagcgcggaA gggcagcgtg 

701 ccaaACtggc aAtcttgGgc atatctTccg aagtggtcgG CTATCAGGCG 

751 GGACATAAAA CGCTTTACCG CGTGCAAagc GGCAatatgt ccgccgatgc 

801 gGTGAAAAAA AT GC AGGACG AGTT GAAAAA GCATGGGGtt gcCAGCCTGA 

30 851 TCCGTGcgAT TGAAGGCAAA TAA 

This encodes the following amino acid sequence <SEQ ID 390>: 

1 MFMNKFSQSG KGLSG FFFGL ILATVIIAGI LL YLNQGGQN AFKIPAPSKQ 

51 PAETEILKLK NQPKEDIQPE PADQNALSEP DVAKEAEQSD AEKAADKQPV 

101 ADKADEVEEK AGEPEREEPD GQAVRKKALT EEREQTVREK AQKKDAETVK 

35 151 KQAVKPSKET EKKASKEEKK AAKEKVAPKP TPEQILNSRS IEKARSAAAK 

201 EVQKMKNFGQ GGSQRIICKW ARMPTVRSAE GQRAKLAILG ISSEWGYQA 

251 GHKTLYRVQS GNMSADAVKK MQDELKKHGV ASLIRAIEGK * 

ORF65ng-l and ORF65-1 show 89.0% identity in 290 aa overlap: 

10 20 30 40 50 60 

40 orf 65-1 . pep MFMNKFSQSGKGLSGFFFGLILATVIIAGILFYLNQSGQNAFKIPASSKQPAETEILKPK 

] I I I I I [ [ I I I I I II I I I I I I I I I I I I I I I I : I I I I : I I I I I I 11 I I I I I I I I I I I I I 
orf 65ng-l MFMNKFSQSGKGLSGFFFGLILATVIIAGILLYLNQGGQNAFKIPAPSKQPAETEILKLK 

10 20 30 40 50 60 

45 70 80 90 100 110 120 

orf 65-1. pep NQPKEDIQPE PADQNALSEPDAATEAEQSDAEKAADKQPVADKADEVEEKAGEPEREEPD 
I M I 1 I II I IS I 1 I I I I I I I I : I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 65ng-l NQPKEDIQPEPADQNALSEPDVAKEAEQSDAEKAADKQPVADKADEVEEKAGEPEREEPD 
70 80 90 100 110 120 

50 

130 140 150 160 170 180 

or f 65-1 . pep GQAVRKKALTEEREQTVREKAQKKDAETVKKQAVKPSKETEKKASKEEKKAAKEKVAPKP 

I I I I I I I I I I I II I I I II I I I I II I I I II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 
orf 65ng-l GQAVRKKALTEEREQTVREKAQKKDAETVKKQAVKPSKETEKKASKEEKKAAKEKVAPKP 

55 130 140 150 160 170 180 

190 200 210 220 230 239 

orf 65-1 -pep TPEQILNSGSIEKARSAAAKEVQKMKTSDKAEATHYL-QMGAYADRQSAEGQRAKLAILG 

II I I I I I I I I I I I I I I I I I I I I II I : : : : : : : : : : I I I I I I I I I M I I 
60 orf65ng-l T PEQI LNSRS IEKARS AAAKEVQKMKNFGQGGS QRI I CKWARMPTVRSAEGQRAKLAILG 
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190 200 210 220 230 240 

240 250 260 270 280 290 

orf65-l pep ISSKWGYQAGHKTLYRVQSGNMSADAVKKMQDELKKHEVASLIRSIESKX 
| | | : I I M I ! I I I I I I ! I ! I I I I I I I M I I I I I I I I I ! 111111:11 = 11 
orf65nq-l ISSEWGYQAGHKTLYRVQSGNMSADAVKKMQDELKKHGVASLIRAIEGKX 
250 260 270 280 290 

On this basis, including the presence of a putative transmembrane domain in the gonococcal 
protein, it is predicted that the proteins from N, meningitidis and N. gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 46 

The following DNA sequence, believed to be complete, was identified in N. meningitidis <SEQ ID 
391>: 

1 ATGAACCACG ACATCACTTT CCTCACCCTG TTCCTACTCG GTkTCTTCGG 

51 CGGAAcGCAC TGCATCGGTA TGTGCGGCGG ATTAAGCAGC GcGTTTGs . s 

101 TCCAACTCCC CCCGCATATC AACCGCTTTT GGCTGATCCT GCTGCTTAAC 

151 ACAGGACGGG TAAGCAGCTA TACGGCAAtC GGCCTGATAC TCGGATTAAT 

201 C GG AC AG GT C GGCGTTTCAC TCGAcCAaAC CCGCGTCCTG CAGAATATTT 

251 TATACACGGC CGCCAACCTC CTGCTGCTCT TTTTAGGCTT ATACTTGAGC 

301 GGTATTTCTT CCTTGGCGGC AAAAAT CG AG AAaAT CGGCA AACCGATATG 

351 GCGGAACCTG AACCCGATAC TCAACCGGCT GTTACCCATA AAAT C CAT AC 

401 CCGCCTGCCT tGCGgTCGGA ATATTATGGG GCTGGCTGCC GTGCGGACTG 

451 GTTTACAGCG CGTCGCTTTA CGCGCTGGGA AgCGGTAGTG CGGCAACGGG 

501 CGGGTTATAT ATGCTTGCCT TTGCACTGGG TACGCTGCCC AATCTTtTAG 

551 CAATCGGCAT TTTtTCCCTG CAACTGAAwA AAATCATGCA AAACCGATAT 

601 ATCCGCCTGT GTACGGGATT ATCCGTATCA TTATGGGCAT TATGGAAACT 

651 TGCCGTCCTG TGGCTGTAA 

This corresponds to the amino acid sequence <SEQ ID 392; ORF103>: 

1 MNHDITFLTL FLLGXFGGTH CIGMCGGLSS AFXXQLPPHI NRFWLILLLN 

51 TGRVSSYTAI GLILGLIGQV GVSLDQTRVL QNILYTAANL LLLFLGLYLS 

101 GISSLAAKIE KIGKPIWRNL NPILNRLLPI KSIPACLAVG ILWGWLPCGL 

151 VYSASLYALG SGSAATGGLY MLAFALGTLP NLLAIGIFSL QLXKIMQNRY 

201 IRLCTGLSVS LWALWKLAVL WL* 

Further work elaborated the DNA sequence <SEQ ID 393> as: 



1 ATGAACCACG ACATCACTTT CCTCACCCTG TTCCTACTCG GTTTCTTCGG 

51 CGGAACGCAC TGCATCGGTA TGTGCGGCGG ATTAAGCAGC GCGTTTGCGC 

101 TCCAACTCCC CCCGCATATC AACCGCTTTT GGCTGATCCT GCTGCTTAAC 

151 ACAGGACGGG TAAGCAGCTA TACGGCAATC GGCCTGATAC TCGGATTAAT 

2 01 CGGACAGGTC GGCGTTTCAC TCGACCAAAC CCGCGTCCTG CAGAATATTT 
251 TATACACGGC CGCCAACCTC CTGCTGCTCT TTTTAGGCTT ATACTTGAGC 

3 01 GGTATTTCTT CCTTGGCGGC AAAAATCGAG AAAAT CGGCA AACCGATATG 
351 GCGGAACCTG AACCCGATAC TCAACCGGCT GTTACCCATA AAAT C CAT AC 

4 01 CCGCCTGCCT TGCGGTCGGA ATATTATGGG GCTGGCTGCC GTGCGGACTG 
4 51 GTTTACAGCG CGTCGCTTTA CGCGCTGGGA AGCGGTAGTG CGGCAACGGG 
501 CGGGTTATAT ATGCTTGCCT TTGCACTGGG TACGCTGCCC AATCTTTTAG 
551 CAATCGGCAT TTTTTCCCTG CAACT GAAAA AAATCATGCA AAACCGATAT 
601 ATCCGCCTGT GTACGGGATT ATCCGTATCA TTATGGGCAT TATGGAAACT 
651 TGCCGTCCTG TGGCTGTAA 

This corresponds to the amino acid sequence <SEQ ID 394; ORF103-1>: 



1 MNHDITFLTL FLLGFFGGTH CIGMCGGLSS AFA LQLPPHI NRFWLILLLN 

51 TGRVSSY TAI GLILGLIGQV GVSL DQTRVL QNILYTAAN L LLLFLGLYLS 

101 GISSLA AKIE KIGKPIWRNL NPILNRLLPI KSIP ACLAVG ILWGWLPCGL 

151 VYSASLYALG SGSAATGGLY M LAFALGTLP NLLAIGIF SL QLKKIMQNRY 
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201 IRLCTGLSVS LWALWKLAVL WL* 

Computer analysis of this amino acid sequence gave the following results : 
Homology with a predicted QRF from N. meningitidis (strain A) 

ORF103 shows 93.8% identity over a 222aa overlap with an ORF (ORF103a) from strain A of N. 
5 meningitidis: 

10 20 30 40 50 60 
nrf 1 03 Dec MNHDITFLTLFLLGXFGGTHCIGMCGGLSSAFXXQLPPHINRFWLILLLNTGRVSSYTAI 
P P || | || | | | | 1 | || I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I 

orfl03a mnxditfltlfllgffggthcigmcgglssafalqlpphinrxwlilllntgrvssytai 

10 10 20 30 40 50 60 

70 80 90 100 110 120 

orfl03 pep glilgligqvgvsldqtrvlqnilytaanllllflglylsgisslaakiekigkpiwrnl 

1 1 II 1 1 1 1 1 1 II 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

1 5 orf 103a glilgligqvgvsldqtrvxqnilytaanllllflglylsgisslaakiekigkpiwrnl 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 103 pep npilnrllpiksipaclavgilwgwlpcglvysaslyalgsgsaatgglymlafalgtlp 

20 I 1 I 1 I I I I I M I I M I II I II I I I II 1 I I I I I I I I I I I I I I I I I I I I I I I I 

orf 103a npilnrllpiksipaclavgilwgwlpcglvysaslyalgsgsaatgglymlafalgtlp 

130 140 150 160 170 180 

190 200 210 220 

25 orf 103. pep nllaigifslqlxkimqnryirlctglsvslwalwklavlwlx 

II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II II 1 1 1 1 1 1 

orf 103a NLXAIGIFSLQLXKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 
190 200 210 220 

The complete length ORF103a nucleotide sequence <SEQ ED 395> is: 

30 1 ATGAACCANG ACATCACTTT CCTCACCCTG TTCCTACTCG GTTTCTTCGG 

51 cggaacgcac tgcatcggta tgtgcggcgg attaagcagc gcgtttgcgc 

101 TCCAACTCCC CCCGCATATC AACCGCTTNT GGCTGATCCT GCTGCTTAAC 

151 acaggacggg taagcagcta tacggcaatc ggcctgatac tcggattaat 

201 CGGACAGGTC GGCGTTTCAC TCGACCAAAC CCGCGTCNTG CAGAATATTT 

35 251 tatacacggc cgccaacctc ctgctgctct ttttaggctt ATACTTGAGC 

301 GGTATTTCTT CCTTGGCGGC AAAAAT CGAG AAAATCGGCA AACCGATATG 

351 GCGGAACCTG AACCCGATAC TCAACCGGCT GTTACCCATA AAATCCATAC 

4 01 CCGCCTGCCT TGCGGTCGGA ATATTATGGG GCTGGCTGCC GTGCGGACTA 

451 GTTTACAGCG CGTCGCTTTA CGCGCTGGGA AGCGGTAGTG CGGCAACGGG 

40 501 CGGGTTATAT ATGCTTGCCT TTGCACTGGG TACGCTGCCC AATCTTTNGG 

551 CAATCGGCAT TTTTTCCCTG C AACT GN AAA AAATCATGCA AAACCGATAT 

601 ATCCGCCTGT GTACGGGATT ATCCGTATCA TTATGGGCAT TATGGAAACT 

651 TGCCGTCCTG TGGCTGTAA 

This encodes a protein having amino acid sequence <SEQ ID 396>: 

45 1 MNXDITFLTL FLLGFFGGTH CIGMCGGLSS AFA LQLPPHI NRXWLILLLN 

51 TGRVSSY TAI GLILGLIGQV GVSL DQTRVX QNILYTAAN L LLLFLGLYLS 

101 GISSLAA KIE KIGKPIWRNL NPILNRLLPI KSIP ACLAVG ILWGWLPCGL 

151 VYSASLYALG SGSAATGGLY M LAFALGTLP NLXAIGIF SL QLXKIMQNRY 

201 IRLCTGLSVS LWALWKLAVL WL* 

50 ORF103a and ORF103-1 show 97.7% identity in 222 aa overlap: 

10 20 30 40 50 60 

orf 103a pep MNXDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRXWLILLLNTGRVSSYTAI 

II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II II 1 1 II 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II I 

orf 103-1 MNHDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRFWLILLLNTGRVSSYTAI 
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GLILGLIGQVGVSLDQTRVXQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 

| | | | ! M I M I I I I II I I I M I I I I M I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
GLILGLIGQVGVSLDQTRVLQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 



70 



80 



100 



110 



120 



130 140 150 160 170 180 

orfl03a pep NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 

| | M 1 | 1 | I || I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf 103-1 NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 
IQ 130 140 150 160 170 180 

190 200 210 220 

orfl03a pep NLXAIGIFSLQLXKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 
|| I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 
15 orfl03-l NLLAIGIFSLQLKKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 

190 200 210 220 

Homology with a predicted ORF from N.gonorrhoeae 

ORF 103 shows 95.5% identity over a 222aa overlap with a predicted ORF (ORF103.ng) from N. 
20 gonorrhoeae: 

orfl03 pep MNHDITFLTLFLLGXFGGTHCIGMCGGLSSAFXXQLPPHINRFWLILLLNTGRVSSYTAI 60 

| I I I I I I I I I I I M I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I 
orf 103ng MNHDITFLTLFLLGFFGGTHCIGMCGGLSSAFALQLPPHINRFWLILLLNTGRISSYTAI 60 

25 orf 103 pep GLILGLIGQVGVSLDQTRVLQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 12 0 

I I : M I II I : I : I I II I I I I I I I I I I I : I I I I I I I I I I M I I I I I I I I I I I I I I I I M I I 

orf 103ng GLMLGLIGQLGISLDQTRVLQNILYTASNLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 120 

orf 103 .pep NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 180 

30 I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I : I I I I I I I I I I I II I I 

orfl03ng NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSATTGGLYMLAFALGTLP 180 

orf 103. pep NLLAI G I FSLQLXKIMQNRYIRLCTGL SVSLWALWKLAVLWL 222 

II II I I II I I I I I I I I I I I I I I I I I I I II I I I I I I I M I I I 

35 orfl03ng NL LAI G I FS LQLKK IMQNRY I RL CT GL S V S LWALWKLAVLWL 222 

The complete length ORF103ng nucleotide sequence <SEQ ID 397> is: 



101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 



ATGAACCACG 
CGGAACT C AC 
TCCAACTCCC 
ACAGGACGGA 
CGGACAACTC 
tatacacagc 
GGTATTTCTT 
GCGCAACCTG 
CCGCCTGCCT 
GTTTACAGCG 
CGGACTGTAT 
CAATCGGCAT 
ATCCGCCTGT 
TGCCGTCCTG 



ACATCACTTT 
TGCATCGGTA 
CCCGCATATC 
TAAGCAGCTA 
GGCATTTCAC 
ctccaaCCTC 
CCTTGGCGGC 
AACCCGATAC 
TGCTGTCGGA 
CATCACTTTA 
ATGCTTGCCT 
TTTTTCCCTG 
GTACAGGATT 
TGGCTGTAA 



CCTCACCCTG 
TGTGCGGCGG 
AACCGCTTTT 
TACGGCAATC 
TCGACCAAAc 
CTGCTGCTCT 
AAAAATCGAG 
TCAACCGGCT 
ATATTATGGG 
CGCGCTGGGA 
TTGCACTGGG 
CAACT GAAAA 
AT CCGTATCA 



TTCCTGCTCG 
ATTAAGCAGC 
GGCTGATTCT 
GGCCTGATGC 
ccgcgTCCTG 

TTTTAGGCTT 
AAAATCGGCA 
GCTGCCCATA 
GCTGGCTGCC 
AGCGGTAGTG 
TACGCTGCCC 
AAAT CAT GCA 
TTATGGGCAT 



GTTTCTTCGG 
GCGTTTGCGC 
GCTGCTTAAC 
TCGGATTAAT 
CAAAAT ATT T 
ATACTTGAGC 
AACCGATATG 
AAAT CC AT AC 
GTGCGGACTG 
CGACAACCGG 
AATCTTTTGG 
AAACCGATAT 
TATGGAAGCT 



55 



This encodes a protein having amino acid sequence <SEQ ID 398>: 

1 MNHDITFLTL FLLGFFGGTH CIGMCGGLSS AFA LOLPPHI NRFWLILLLN 

51 TGRISSY TAI GLMLGLIGQL GISL DQTRVL QNILYTASN L LLLFLGLYLS 

101 GISSLAA KIE KIGKPIWRNL NPILNRLLPI KSIP ACLAVG ILWGWLPCGL 

151 VYSASLYALG SGSATTGGLY M LAFALGTLP NLLAIGIF SL QLKKIMQNRY 

201 IRLCTGLSVS LWALWKLAVL WL* 

In addition, ORF103ng and ORF103-1 show 97.3% identity in 222 aa overlap: 
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orfl03nq mnhditfltlfllgffggthcigmcgglssafalqlpphinrfwlilllntgrissytai 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 103-1 pep GLILGLIGQVGVSLDQTRVLQNILYTAANLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 
||:MIIII:I:IIIIIMMIIIIII:IMIMII1IIIIIMMMIMMIII1III 

orfl03na GLMLGLIGQLGISLDQTRVLQNILYTASNLLLLFLGLYLSGISSLAAKIEKIGKPIWRNL 
y 70 80 90 100 110 120 

130 140 150 160 170 180 

orf 103-1 pep NPILNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSAATGGLYMLAFALGTLP 

I [ | | M I I M I I I I I I I I I I I I M I I I I I I I I I I I I I I M M I I I M I ! I ! I I I I II ! I 

orfl03nci N piLNRLLPIKSIPACLAVGILWGWLPCGLVYSASLYALGSGSATTGGLYMLAFALGTLP 

130 140 150 160 170 180 

190 200 210 220 

orfl03-l pep NLLAIGIFSLQLKKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 
I | | I I I I I II I I I I I I M I I I M I I I I I I I I I I I I H I I I I I I 
orf 103ng NLLAIGIFSLQLKKIMQNRYIRLCTGLSVSLWALWKLAVLWLX 

190 200 210 220 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 47 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 399>: 

1 ATGGAAAACC AAAGGCCGCT CCTAGGCTTT CGCTTGGCAC TTTTGGCGGC 

51 GATGACGTGG GGAACGCTGC CGAT.TCCGT GCGGCAGGTA TTGAAGTTTG 

101 TCGATGCGCC GACGCTGGTG TGGGTGCGTT TTACCGTGGC GGCGGCGGTA 

151 TTGTTTGTTT TGCTGGCACT GGGCGGGCGG CTGCcGAAGC GGCG^GGATT 

201 TTTCTTGGTG CTCATTCAGG CTGCTGCTGC TCGGCGTGGC GGGCATTTCG 

251 GCAAACTTTG TGCTGATTGC CCAAGGGCTG CATTATATTT CGCCGACCAC 

301 GACGCAGGTT TTGTGGCAGA TTTCGCCGTT TACGATGATT GTwGTCGGTG 

351 TGTTGGTGTT TAAAGACCGG ATGACTGCCG CTCAGAAAAT CGGCTTGGTT 

4 01 TTGCTGCTTG CCGGTTTGCT TATGTATTTT AACGATAAAT TCGGCGAGTT 

451 GTCGGGTTTG GGCGCGTATG C.AAGGGCGT GTTGCTGTGT GCGGCAGGCA 

501 GTATGGCATG GGTGTGTAAT GCCGTGGCGC AAAAGCTGCT GTCGGCGCAA 

551 TTCGGGCCGC AACAGATTCT GCTGTTGATT TATGCGGCAA GTGCCGCCGT 

601 GTTCCTGCCG TTTGCCGAAC CGGCACACAT CGGAAGTATG GACGGTACGT 

651 TGGCGTGGGT ATGTATTGCG TATTGCTGCT TGAATACGTT AATCGGTTAC 

7 01 GGCTCGTTCG GCGAGGCGTT GAAACATTGG GAGGCTTCCA AAGTCAGCGC 

7 51 GGTAACAACC TTGCTCCCCG TGTTTACCGT AATAAATACT TTGCTCGGGC 

8 01 ATTATGTGAT GCCTGAAACT TTTGCCGCGC CGGA. . 

This corresponds to the amino acid sequence <SEQ ID 400; ORF104>: 



1 MENQRPLLGF RLALLAAMTW GTLPXSVRQV LKFVDAPTLV WVRFTVAAAV 

51 LFVLLALGGR LPKRRDFSWC SFRLLLLGVA GISANFVLIA QGLHYISPTT 

101 TQVLWQISPF TMIWGVLVF KDRMTAAQKI GLVLLLAGLL MYFNDKFGEL 

151 SGLGAYXKGV LLCAAGSMAW VCNAVAQKLL SAQFGPQQIL LLIYAASAAV 

201 FLPFAEPAHI GSMDGTLAWV CIAYCCLNTL IGYGSFGEAL KHWEASKVSA 

251 VTTLLPVFTV INTLLGHYVM PETFAAP... 

Further work revealed further partial DNA sequence <SEQ ID 40 1>: 



1 ATGGAAAACC AAAGGCCGCT CCTAGGCTTC GCGTTGGCAC TTTTGGCGGC 

51 GATGACGTGG GGAACGCTGC CGATTGCCGT GCGGCAGGTA TTGAAGTTTG 

101 TCGATGCGCC GACGCTGGTG TGGGTGCGTT TTACCGTGGC GGCGGCGGTA 

151 TTGTTTGTTT TGCTGGCACT GGGCGGGCGG CTGCCGAAGC GGCGGGATTT 

201 TTCTTGGTGC TCATTCAGGC TGCTGCTGCT CGGCGTGGCG GGCATTTCGG 
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251 CAAACTTTGT GCTGATTGCC CAAGGGCTGC ATTATATTTC GCCGACCACG 

3 01 ACGCAGGTTT TGTGGCAGAT TTCGCCGTTT ACGATGATTG TTGTCGGTGT 

351 GTTGGTGTTT AAAGACCGGA TGACTGCCGC TCAGAAAATC GGCTTGGTTT 

401 TGCTGCTTGC CGGTTTGCTT ATGTTTTTTA ACGATAAATT CGGCGAGTTG 

5 451 TCGGGTTTGG GCGCGTATGC GAAGGGCGTG TTGCTGTGTG CGGCAGGCAG 

501 TATGGCATGG GTGTGTTATG CCGTGGCGCA AAAGCTGCTG TCGGCGCAAT 

551 TCGGGCCGCA ACAGATTCTG CTGTTGATTT ATGCGGCAAG TGCCGCCGTG 

601 TTCCTGCCGT TTGCCGAACC GGCACACATC GGAAGTTTGG ACGGTACGTT 

651 GGCGTGGGTT TGTTTTGCGT ATTGCTGCTT GAATACGTTA ATCGGTTACG 

JO 7 01 GCTCGTTCGG CGAGGCGTTG AAACATTGGG AGGCTTCCAA AGTCAGCGCG 

751 GTAACAACCT TGCTCCCCGT GTTTACCGTA ATAwTwwCTT TGCTCGGGCA 

801 TTATGTGATG CCTGAAACTT TTGCCGCGCC GGA. . . 

This corresponds to the amino acid sequence <SEQ ID 402; ORF104-1>: 

1 MENQRPLLGF ALALLAAMT W GTLPIAVRQV LKFVDAPT LV WVRFTVAAAV 

15 51 LFVLL ALGGR LPKRRDFSWC SFR LLLLGVA GISANFVLIA QGLHYISPTT 

101 TQ VLWQISPF TMIWGVLV F KDRMT AAQKI GLVLLLAGLL MFF NDKFGEL 

151 SGLGAYAKG V LLCAAGSMAW VCYAVA QKLL SAQFGPQQ IL LLIYAASAAV 

201 FLPFA EPAHI GSLD GTLAWV CFAYCCLNTL I GYGSFGEAL KHWEASKVSA 

251 VTTLLPVFTV IXXL LGHYVM PETFAAP... 

20 Computer analysis of this amino acid sequence gave the following results: 

Homology with hypothetical HI0878 protein of H. influenzae (accession number U32769) 
ORF104 and HI0878 show 40% aa identity in 277aa overlap: 

orfl0 4 4 QRPLLGFRIALLAAMTWGTLPXSVRQVLKFVDAPTLVWXXXXXXXXXXXXXXXXXXXXP- 62 
Q+PLLGF AL+ AM WG+LP +++QVL ++A T+VW P 
25 HI087 8 3 QQPLLGFTFALITAMAWGSLPIALKQVLSVMNAQTIVWYRFIIAAVSLLALLAYKKQLPE 62 

orfl04 63 — KRRDFSWCSFRLLLLGVAGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 120 

K R ++W ++L+GV G+++NF+L + L+YI P+ Q+ +S F M++ GVL+F 
HI0878 63 LMKVRQYAW IMLIGVIGLTSNFLLFSSSLNYIEPSVAQIFIHLSSFGMLICGVLIF 118 

30 

orfl04 121 KDRMTAAQKIXXXXXXXXXXMYFNDKFGELSGLGAYXKGVLLCAAGSMAWVCNAVAQKLL 180 

K+++ qki ++FND+F +GL Y GV+L G++ WV +AQKL+ 

HI 0878 119 KEKLGLHQKIGLFLLLIGLGLFFNDRFDAFAGLNQYSTGVILGVGGALIWVAYGMAQKLM 17 8 

35 orfl04 181 SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSMDGTLAWVCIAYCCLNTL IGYGSFGEAL 240 

+F QQILL++Y A F+P A+ + + + LA +C YCCLNTLIGYGS+ EAL 
HI0878 17 9 LRKFNSQQILLMMYLGCAIAFMPMADFSQVQELT-PLALICFIYCCLNTLIGYGSYAEAL 237 

orfl04 241 KHWEASKVSAVTTLLPVFTVINTLLGHYVMPETFAAP 277 
40 W+ SKVS V TL+P+FT++ + + HY P FAAP 

HI0878 238 NRWDVSKVSWITLVPLFTILFSHIAHYFS PAD FAAP 274 

Homology with a predicted ORF from N meningitidis (strain A) 

ORF104 shows 95.3% identity over a 277aa overlap with an ORF (ORF104a) from strain A of TV. 
45 meningitidis: 

10 20 30 40 50 60 

orf 104 . pep MENQRPLLGFRLALLAAMTWGTLPXSVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 
I I I I I I I I I I ill : I 1 I ! I I I I I I I I I I I I I I I I I I I I I I M I I I I I I 
orf 104a MENQRPLLGFALALLAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 
50 10 20 30 40 50 60 

70 80 90 100 110 120 

orf 104 .pep LPKRRDFSWCSFRLLLLGVAGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 
III I I ! I I I I I I I I I I II I I I II I I I I II I I I I I I II I I I I I I I I I I I I I I I I II I I I I 
55 orf 104a LPKWRDFSWCSFRLLLLGVAGISANFVL IAQGLHYISPTTTQVLWQISPFTMIWGVLVF 

70 80 90 100 110 120 

130 140 150 160 170 180 

or f 104 . pep KDRMTAAQKIGLVLLLAGLLMYFNDKFGELSGLGAYXKGVLLCAAGSMAWVCNAVAQKLL 
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I | | | | 1 | 1 I I I I I I 1 1 I I I I I : I I I I I I I I I I M ,11 I I I M I I 

KDRMTAAQKIGLVLLLAGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 
130 140 150 160 170 180 

190 200 210 220 230 240 

SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSMDGTLAWVCIAYCCLNTLIGYGSFGEAL 

| | | | | | | | | | | I I I [ I I I I I I I I I I I 1 11 I I : I I I I I I I I : M I I I II I II I M I I I I I 
SAQFGPQQILLLIYAASAAVFLPFAELAHIGSLDGTLAWVCFAYCCLNTLIGYGSFGEAL 
190 200 210 220 230 240 

250 260 270 

KHWEASKVSAVTTLLPVFTVINTLLGHYVMPETFAAP 
I I I I II I I I I I I I I 1 I M I I I : I I I M I I I : I I I II 

KHWEASKVSAVTTLLPVFTVIFSLLGHYVMPDTFAAPDMNGLGYAGALWVGGAVTAAVG 
250 260 270 280 290 300 

The complete length ORF104a nucleotide sequence <SEQ ID 403> is: 

1 ATGGAAAACC AAAGGCCGCT CCTAGGCTTC GCGTTGGCAC TTTTGGCGGC 

51 GATGACGTGG GGAACGCTGC CGATTGCCGT GCGGCAGGTA TTGAAGTTTG 

101 TCGATGCGCC GACGCTGGTG TGGGTGCGTT TTACCGTGGC GGCGGCGGTA 

151 TTGTTTGTTT TGCTGGCATT GGGCGGGCGG CTGCCGAAGT GGCGGGATTT 

201 TTCTTGGTGC TCATTCAGGC TGCTGCTGCT CGGCGTGGCG GGCATTTCGG 

251 CAAACTTTGT GCTGATTGCC CAAGGGCTGC AT TAT AT T T C GCCGACCACG 

301 ACGCAGGTTT TGT GGCAGAT TTCGCCGTTT ACGATGATTG TTGTCGGTGT 

351 GTTGGTGTTT AAAGACCGGA TGACTGCCGC TCAGAAAATC GGCTTGGTTT 

401 TGCTGCTTGC CGGTTTGCTT ATGTTTTTTA ACGATAAATT CGGCGAGTTG 

451 TCGGGTTTGG GCGCGTATGC GAAGGGCGTG TTGCTGTGTG CGGCAGGCAG 

501 TATGGCATGG GTGTGTTATG CCGTGGCGCA AAAGCT GCTG TCGGCGCAAT 

551 TCGGGCCGCA ACAGATTCTG CTGTTGATTT ATGCGGCAAG TGCCGCCGTG 

601 TTCCTGCCGT TTGCCGAACT GGCACACATC GGAAGTTTGG ACGGTACGTT 

651 GGCGTGGGTT TGTTTTGCGT ATTGCTGCTT GAATACGTTA ATCGGTTACG 

7 01 GCTCGTTCGG CGAGGCGTTG AAACATTGGG AGGCTTCCAA AGTCAGCGCG 

751 GTAACAACCT TGCTCCCCGT GTTTACCGTA ATATTTTCTT TGCTCGGGCA 

801 TTATGTGATG CCTGATACTT TTGCCGCGCC GGATATGAAC GGTTTGGGTT 

851 ATGCCGGCGC ACTGGTCGTG GTCGGGGGTG CGGTTACGGC GGCGGTGGGG 

901 GACAGGCTGT TCAAACGCCG CTAG 

This encodes a protein having amino acid sequence <SEQ ED 404>: 



orfl04 .pep 
orfl04a 

orf 104 .pep 
orfl04a 



1 MENQRPLLGF ALALLAAMTW GTLPIAVRQV LKFVDAPT LV WVRFTVAAAV 

51 LFVLL ALGGR LPKWRDFSWC SFR LLLLGVA GISANFVLIA QGLHYISPTT 

101 TQ VLWQISPF TMIWGVLV F KDRMT AAQKI GLVLLLAGLL MFF NDKFGEL 

151 SGLGAYAKG V LLCAAGSMAW VCYAVA QKLL SAQFGPQQ IL LLIYAASAA V 

201 FLPFAELAHI GSLD GTLAWV CFAYCCLNTL I GYGSFGEAL KHWEASK VSA 

251 VTTLLPVFTV IFSL LGHYVM PDTFAAPDMN GL GYAGALW VGGAVTAAV G 

301 DRLFKRR* 

ORF104a and ORF104-1 show 98.2% identity in 277 aa overlap: 



10 20 30 40 50 60 

orf 104a . pep MENQRPLLGFALALLAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 
! 1 I i I II II I I I I I I I M I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
orf 104-1 MENQRPLLGFALALLAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 104a. pep LPKWRDFSWCSFRLLLLGVAGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 
III I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II I I I I I I I I I I I I I I 1 I 
orf 104-1 LPKRRDFSWCSFRLLLLGVAGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 104a . pep KDRMTAAQKI GLVLLLAGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 
I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I II II I I I II 1 1 1 U M I I I I I I I 1 I I I 
or f 1 04 -1 KDRMTAAQKI GLVLLLAGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 

130 140 150 160 170 180 



190 200 210 220 230 240 
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nrf 1 04a net) SAQFGPQQILLLIYAASAAVFLPFAELAHIGSLDGTLAWVCFAYCCLNTLIGYGSFGEAL 

orfl04a.pep " | | | | I 1 1 I I I I I i I I 1 1 I I I I I I I I I I I I I I I I I Hill 

orf 104-1 SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSLDGTLAWVCFAYCCLNTLIGYGSFGEAL 
190 200 210 220 230 240 

250 260 270 280 290 300 

orf 104a pep KHWEASKVSAVTTLLPVFTVIFSLLGHYVMPDTFAAPDMNGLGYAGALWVGGAVTAAVG 

I | | I 1 M I I I I I I I M 1 I I I I 11111111:11111 
orf 104-1 KHWEASKVSAVTTLLPVFTVIXXLLGHYVMPETFAAP 
250 260 270 

Homology with a predicted ORF from N. gonorrhoeae 

ORF 104 shows 93.9% identity over a 277aa overlap with a predicted ORF (ORF104.ng) from N. 



30 



orf 10 4 pep MENQRPLLGFRLALLAAMTWGTLPXSVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 60 

I | | | I I || I I II I I I I I M I I 1 I : I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf 10 4ng MENQRPLLGFALALLAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 60 

orfl04 pep LPKRRDFSWCS FRLLLLGVAG I SAN FVLI AQGLHYI S PTTTQVLWQI S PFTMI WGVLVF 120 

M I I I I I I I I I I ! I I I I I : II I M I I I I I I I I I I II I I I I I I II I I I M I I I I I I I I M 

orf 104ng LPKRRDFSWHSFRLLLLGVTGISANFVLIAQGLHYISPTTTQVLWQISPFTMIVVGVLVF 120 

orf 104 pep KDRMTAAQKIGLVLLLAGLLMYFNDKFGELSGLGAYXKGVLLCAAGSMAWVCNAVAQKLL 18 0 

I I I I I I I I I I I I I I I I : II I I : I I I I I I M I I I I I I I II II II I I I I I I II I I I I I I I 

orf 1 0 4ng KDRMTAAQKIGLVLLLVGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 18 0 

orf 10 4 pep SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSMDGTLAWVCIAYCCLNTLIGYGSFGEAL 240 

| I I I I I I I I 1 II I I I I I I I I I I I I I I I I I I : I I I I I I I I :: I I I I I I I I I II I I II M 

or f 10 4ng SAQFGPQQILLLIYAASAAVFLLXAEPAHIGSLDGTLAWVCFVYCCLNTLIGYGSFGEAL 24 0 

277 

300 



orf 104 .pep 
orfl04ng 



KHWEASKVSAVTTLLPVFTVINTLLGHYVMPETFAAP 
KHWEAS^SAVTTLLPVFTVIFSLLGHYVM 



The complete length ORF104ng nucleotide sequence <SEQ ID 405> is predicted to encode a 
protein having amino acid sequence <SEQ ED 406>: 

1 MENORPLLGF ALALLAAMT W GTLPIAVRQV LKFVDAPT LV WVRFTVAAAV 

51 LFVLLALGGR LPKRRDFSWH SFR LLLLGVT GISANFVLIA QGLHYISPTT 

101 TQ VLWQISPF TM I WGVLV F KDRMTA AQKI GLVLLLVGLL MFFN DKFGEL 

151 SGLGAYAKG V LLCAAGSMAW VCYAVA QKLL SAQFGPQ QIL LLIYAASAAV 

201 FLLXA EPAHI GSL DGT LAWV CFVYCCLNTL IGYGSFGEAL KHWEAS KVSA 

251 VTTLLPVFTV IFS LLGHYVM PDTFAAPDMN G LGYVGALW VGGAVTAAV G 

301 DRPFKRR* 

Further work revealed the complete gonococcal nucleotide sequence <SEQ ID 407>: 



101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 



AT GGAAAAC C 
GATGACGTGG 
TCGATGCGCC 
TTGTTTGTTT 
TTCTTGGCAT 
CAAACTTTGT 
ACGCAGGTTT 
GTTGGTGTTT 
TGCTGCttgT 
TCGGGTTTGG 
TATGGCCTGG 
TCGGGCCGCA 
TTCCtgccgT 

GGCGTGGGTT 
GCTCGTTCGG 
GTAACAACCT 
TTATGTGATG 



AAAGGCCGCT 
GGGACGCTGC 
GACGCTGGTG 
TGCTGGCATT 
TCATTCAGGC 
GCTGATTGCC 
TGTGGCAGAT 
AAAGACCGGA 
CGGTttgCTT 
GCGCGTATGC 
GTGTGTTATG 
ACAGATTCTG 
TTGccgaaCC 

TGTTTTGTGT 
CGAGGCGTTG 
TGCTCCCCGT 
CCTGATACTT 



CCTAGGCTTC 
CGATTGCCGT 
TGGGTGCGTT 
GGGCGGGCGG 
TGCTGCTGCT 
CAAGGGCTGC 
TTCGCCGTTT 
tgaCTGCCGC 
ATGTTTTtta 
GAAGGGCGTG 
CCGTGGCGCA 
CTGTTGATTT 
GGCACACATC 
ATTGCTGCTT 
AAACATTGGG 
GTTTACCGTA 
TTGCCGCGCC 



GCGTTGGCAC 
GCGGCAGGTA 
TTACCGTGGC 
CTGCCGAAGC 
CGGCGTGACG 
ATTATATTTC 
ACGATGATTG 
GCAGAAAATC 
ACGACAAATT 
TTGCTGTGTG 
AAAGCTGCTG 
ATGCGGcaag 
GGAAGTTTgg 
GAATACGTTA 
AGGCTTCCAA 
ATATTTTCTT 
GGATATGAAC 



TTTTGGCGGC 
TTGAAGTTTG 
GGCGGCGGTA 
GGCGGGATTT 
GGCATTTCGG 
GCCGACCACG 
TTGTCGGCGT 
GGTTTGGTTT 
CGGCGAGTTG 
CGGCAGGCAG 
TCGGCGCAAT 
tgccgccGTG 
aCGGTACGtt 
ATCGGTTACG 
AGTCAGCGCG 
TGCTCGGGCA 
GGTTTGGGTT 
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851 ATGTCGGCGC ACTGGTCGTG GTCGGGGGTG CGGTTACGGC GGCGGTGGGG 
901 GACAGGCCGT TCAAACGCCG CTAG 

This corresponds to the amino acid sequence <SEQ ID 408; ORF104ng-l>: 

1 MENQRPLLGF ALALLAAMTW GTLPIAVRQV LKFVDAPT LV WVR FT VAAAV 

51 LFVLLA LGGR LPKRRDFSWH SFR LLLLGVT GISANFVLIA QGLHYISPTT 

101 TQ VLWQISPF TMIVVGVLV F KDRMT AAQKI GLVLLLVGLL MFFN DKFGEL 

151 SGLGAYAKG V LLCAAGSMAW VCYAVA QKLL SAQFGPQQ IL LLIYAASAAV 

201 FLPFA EPAHI GSLD GTLAWV CFVYCCLNTL I GYGSFGEAL KHWEASKVSA 

251 VTTLLPVFTV IFSL LGHYVM PDTFAAPDMN GL GYVGALW YGGAVTAAV G 

301 DRPFKRR* 

ORF104ng-l and ORF 104-1 show 97.5% identity in 277 aa overlap: 



orfl04-l .pep 
orf 104ng-l 



irf 104-1 .pep 
irfl04ng-l 



irfl04-l.] 
.rfl04ng-: 



MENQRPLLGFALALLAAMTWGTLPIAVRQVLKFVDAPTLVWVRFTVAAAVLFVLLALGGR 
MENQRPLLGFALALLAAMTWCT 



20 



30 



40 



50 



60 



70 80 90 100 110 120 

LPKRRDFSWC S FRLLLLGVAGI S AN FVLI AQGLHYI S PTTTQVLWQI S PFTMI WGVLVF 
i I I I I I I I I I I I ! I I I I I : I I I I I I I I I I I I I II I ! I I I I II I [ I I I I I I I I I I I I I I I 
LPKRRDFSWHSFRLLLLGVTGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 

70 80 90 100 110 120 

130 140 150 160 170 180 

KDRMTAAQKIGLVLLLAGLLMFFNDKFGELSGLGAYAKGVLLCAAGSMAWVCYAVAQKLL 



30 



rfl04-l.pep 
rfl04ng-l 



190 200 210 220 230 240 

SAQFGPQQILLLIYAASAAVFLPFAEPAHIGSLDGTLAWVCFAYCCLNTLIGYGSFGEAL 



rf 104-1 .pep 
rfl04ng-l 



I I I I 



Mill 



I I 



I I I I I : I I I I I 



60 



In addition, ORF104ng-l shows significant homology with a hypothetical H.influenzae protein: 

influenzae] Length = 306 

Gaps = 8/280 (2%) 



gi I 1573895 (U32769) hypothetical [Haemophilu 
237 bits (598), Expect = 8e-62 

■ 114/280 (40%), Positives = 168/280 (59%), 



Score 




Ident 




Query: 


30 


Sbjct: 


3 


Query: 


89 


Sbjct: 


63 




147 


Sbjct: 


119 




207 


Sbjct: 


179 




267 



QRPXXXXXXXXXXXMTWGTLPIAVRQVLKFVDAPTLVWXXXXXXXXXXXXXXXXXXXXP- 8 8 
Q+P M WG+LPIA++QVL ++A T+VW P 

QQPLLGFTFALITAMAWGSLPIALKQVLSVMNAQTIVWYRFIIAAVSLLALLAYKKQLPE 62 

— KRRDFSWHSFRLLLLGVTGISANFVLIAQGLHYISPTTTQVLWQISPFTMIWGVLVF 14 6 

K R ++W ++L+GV G+++NF+L + L+YI P+ Q+ +S F M++ GVL+F 

LMKVRQYAW IMLIGVIGLTSNFLLFSSSLNYIEPSVAQIFIHLSSFGMLICGVLIF 118 



+GL Y+ GV+L G++ WV Y +AQKL+ 



+F QQILL++Y 



LA +CF+YCCLNTLIGYGS+ EAL 



W+ SKVS V TL+P+FT++FS + HY P FAAP++N 
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Sbjct: 238 NRWDVSKVSVVITLVPLFTILFSHIAHYFSPADFAAPELN 277 

Based on this analysis, including the presence of a putative leader sequence and several putative 
transmembrane domains in the gonococcal protein, it is predicted that the proteins from 
N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 

Example 48 

The following partial DNA sequence was identified in TV. meningitidis <SEQ ID 409>: 

1 ATGGTAGCTC GTCGGGCTCA TAACCCGAAG GTCGTAGGTT CGAATCCTGT 

51 .CCCGCAACC TAATTTCAAA CCCCTCGGTT CAATGCCGAG GG.GTTTTGT 

101 T . TTGCCTGT TTCCTGTTTC CTGTTTCCTG CCGCCTCCGT TTTTTGCCGG 

151 ATTTTCCTTC CGGCCGCAAT AT CGGAACGG CAGACCGCCG TCTGTTTGCG 

201 GTTGCAAATT CAGGCAGTTT GGCTACAATC TTCCGCATTG TCTTCAAGAA 

251 AGCCAACCAT GCCGACCGTC CGTTTTACCG AATCCGTCAG CAAACAAGAC 

301 CTTGATGCTC TGTTCGAGTG GGCAAAAGCA AGTTACGGTG CAGAAAGTTG 

351 CTGGAAAACG CTGTATCTGA ACGGTCysCC TTTGGGCAAC CTGTCGCCGG 

401 AATGGGTGGA ACGCGTsmraA AAAGACTGGG AGGCAGGCTG CyCGGAGTCT 

451 TCAGACGGCA TTTTTCTGAA TgCGGACGGc TGgCctGATA TGGgCGGAcg 

501 cTTACAGCAC CTCGCCCTCG GTTGGCACTG TGCGGGGCTG TTGGACGgsT 

551 GGCGCAACGA GTGTTTCGAC CTGACCGACG GCGGCGGCAA CCCCTTGTTC 

601 ACGCTCGaAc GCGCCGyTTT mCGTCCTkTC GGACTGCTCA GCCGCGCCGT 

651 CCATCTCAAC GGTCTGACCG AATCGGACGG CCGATGGCAT TTCTGGATAG 

701 GCAGGCGCAG TCCGCACAAA GCAGTCGATC CCAACAAACT CGACAATACT 

751 rCCGCCGGCG GTGTTTCCGG CGGCGAAATG CCGTCTGAAG CCGTGTGTCG 

8 01 CGAAAGCAGC GAAGAAGCCG GTTTGGATAA AACGCTGcTT CCGCTCATCC 

851 GCCCGGTATC GCAGCTGCAC AGCCTGCGCT CCGTCAGCCG GGGTGTACAC 

901 AATGAAATCC TGTATGTATT CGATGCCGTC CTGCCG. . . 

This corresponds to the amino acid sequence <SEQ ID 410; ORF105>: 

1 MVARRAHNPK WGSNPXPAT XFQTPRFNAE XVLXLPVSCF LFPAASVFCR 

51 IFLPAAISER QTAVCLRLQI QAVWLQSSAL SSRKPTMPIV RFTESVSKQD 

101 LDALFEWAKA SYGAESCWKT LYLNGXPLGN LSPEWVERVX KDWEAGCXES 

151 SDGIFLNADG WPDMGGRLQH LALGWHCAGL LDGWRNECFD LTDGGGNPLF 

201 TLERAXXRPX GLLSRAVHLN GLTESDGRWH FWIGRRSPHK AVDPNKLDNT 

251 XAGGVSGGEM PSEAVCRESS EEAGLDKTLL PLIRPVSQLH SLRSVSRGVH 

3 01 NEILYVFDAV LP... 

Further work revealed the complete nucleotide sequence <SEQ ID 41 1>: 



1 ATGCCGACCG TCCGTTTTAC 

51 TCTGTTCGAG TGGGCAAAAG 

101 CGCTGTATCT GAACGGTCTG 

151 GAACGCGTCA AAAAAGACTG 

201 CATTTTTCTG AATGCGGACG 

251 ACCTCGCCCT CGGTTGGCAC 

301 GAGTGTTTCG ACCTGACCGA 

351 ACGCGCCGCT TTCCGTCCTT 

4 01 ACGGTCTGAC CGAATCGGAC 

4 51 AGTCCGCACA AAGCAGTCGA 

501 CGGTGTTTCC GGCGGCGAAA 

551 GCGAAGAAGC CGGTTTGGAT 

601 TCGCAGCTGC ACAGCCTGCG 

651 CCTGTATGTA TTCGATGCCG 

7 01 AGGATGGCGA AGTGGCGGGT 

751 GATGCCATGT TGTCGGGAAA 

801 GGACGCGTTT TGCCGTTACG 

851 AGTGGCTGGA CGGCATACGT 



CGAATCCGTC AGCAAACAAG ACCTTGATGC 
CAAGTTACGG TGCAGAAAGT TGCTGGAAAA 
CCTTTGGGCA ACCTGTCGCC GGAATGGGTG 
GGAGGCAGGC TGCTCGGAGT CTTCAGACGG 
GCTGGCCTGA TATGGGCGGA CGCTTACAGC 
TGTGCGGGGC TGTTGGACGG CTGGCGCAAC 
CGGCGGCGGC AACCCCTTGT TCACGCTCGA 
TCGGACTGCT CAGCCGCGCC GTCCATCTCA 
GGCCGATGGC ATTTCTGGAT AGGCAGGCGC 
TCCCAACAAA CTCGACAATA CTGCCGCCGG 
TGCCGTCTGA AGCCGTGTGT CGCGAAAGCA 
AAAACGCTGC TTCCGCTCAT CCGCCCGGTA 
CTCCGTCAGC CGGGGTGTAC ACAAT G AAAT 
TCCTGCCCGA AACCTTCCTG CCTGAAAATC 
TTTGAGAAAA TGGACATCGG CGGTCTGTTG 
CATGATGCAC GACGCGCAAC TGGTTACGCT 
GTCTGATTGA TGCCGCCCAT CCGCTGTCCG 
TTATAG 



This corresponds to the amino acid sequence <SEQ ID 412; ORF105-1>: 
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1 MPTVRFTESV SKQDLDALFE WAKASYGAES CWKTLYLNGL PLGNLSPEWV 

51 ERVKKDWEAG CSESSDGIFL NADGWPDMGG RLQHLALGWH CAGLLDGWRN 

101 ECFDLTDGGG NPLFTLERAA FRPFGLLSRA VHLNGLTESD GRWHFWIGRR 

151 SPHKAVDPNK LDNTAAGGVS GGEMPSEAVC RESSEEAGLD KTLLPLIRPV 

201 SQLHSLRSVS RGVHNEILYV FDAVLPETFL PENQDGEVAG FEKMDIGGLL 

251 DAMLSGNMMH DAQLVTLDAF CRYGLIDAAH PLSEWLDGIR L* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF105 shows 89.4% identity over a 226aa overlap with an ORF (ORF105a) from strain A of TV. 
meningitidis: 

60 70 80 90 100 110 

orfl05.pep ISERQTAVCLRLQIQAVWLQSSALSSRKPTMPTVRFTESVSKQDLDALFEWAKASYGAES 

I I I I I I I I I I M : I I I I I I I I I I I II I I I I 
orfl05a MPTVRFTESVSKHDLDALFEWAKASYGAES 

10 20 30 



120 130 140 150 160 170 

orf 105 . pep CWKTLYLNGX PLGNLS PEWVERVXKDWEAGCXE S S DGI FLNADGWPDMGGRLQHLALGWH 
I I I I I II I I 111111111:111 I I I I I I I I I I I I I I I I I I I I II I I MINI I : 
orf 105a CWKTLYLNGL PLGNLS PEWAERVKKDWEAGCSESS DGI FLNADGWPDMGRRLQHLARIWK 

40 50 60 70 80 90 



180 190 200 210 220 230 

orf 105 . pep CAGLLDGWRNECFDLTDGGGNPLFTLERAXXRPXGLLSRAVHLNGLTESDGRWHFWIGRR 
I I I I I I I : I I I I I I I II : 11 I I : I I I I II I I I I I II I I I I I : I I I I I I I I I I I I I 
orf 105a EAGLLHGWRDECFDLTDGGSNPLFALERAAFRPFGLLSRAVHLNGLVESDGRWHFWIGRR 
100 110 120 130 140 150 



240 250 260 270 280 290 

orf 105 . pep SPHKAVDPNKLDNTXAGGVSGGEMPSEAVCRESSEEAGLDKTLLPLIRPVSQLHSLRSVS 
I I I I I I I I : I I I I I I I I I I : I I : I I I : I I I I I I I II II I I I I I I I 1 I I I I I I I I I I II 
orf 105a SPHKAVDPDKLDNTAAGGVSSGELPSETVCRESSEEAGLDKTLLPLIRPVSQLHSLRPVS 
160 170 180 190 200 210 



orf 105 .pep 
orfl05a 



The complete length ORF 105a nucleotide sequence <SEQ ID 41 3> is: 



1 ATGCCGACCG TCCGTTTTAC CGAATCCGTC AGCAAACACG ACCTTGATGC 

51 CCTATTCGAG TGGGCAAAGG CAAGTTACGG TGCGGAAAGT TGCTGGAAAA 

101 CGCTGTATCT GAACGGTCTG CCTTTGGGCA ATCTGTCGCC GGAATGGGCG 

151 GAGCGCGTCA AAAAAG AC T G GGAGGCAGGC TGCTCGGAGT CTTCAGACGG 

201 CATTTTCCTG AATGCGGACG GCTGGCCAGA TATGGGCAGA CGCTTGCAGC 

251 ACCTCGCCCG AATATGGAAA GAAGCGGGAC TGCTTCACGG CTGGCGCGAC 

3 01 GAGTGTTTCG ACCTGACCGA CGGCGGCAGC AATCCCTTGT TCGCGCTCGA 
351 ACGCGCCGCT TTCCGTCCGT TCGGACTGCT CAGCCGCGCC GTCCATCTCA 

4 01 ACGGTTTGGT CGAATCGGAC GGCCGATGGC ATTTCTGGAT AGGCAGGCGC 
4 51 AGTCCGCACA AAGCAGTCGA TCCCGACAAA CTCGACAATA CTGCCGCCGG 
501 CGGTGTTTCC AGCGGT GAAT TGCCGTCTGA AACCGTGTGT CGCGAAAGCA 
551 GCGAAGAAGC CGGTTTGGAT AAAACGCTGC TTCCGCTCAT CCGCCCGGTA 
601 TCGCAGCTGC ACAGCCTGCG CCCCGTCAGC CGGGGTGTGC AC AAT GAAAT 
651 CCTGTATGTA TTCGATGCCG TCCTGCCCGA AACCTTCCTG CCTGAAAATC 
7 01 AGGAT GGCGA AGTGGCGGGT TTTGAGAAAA TGGACATCGG CGGTCTGTTG 

7 51 GCTGCCATGT TGTCGGGAAA CATGATGCAC GACGCGCAAC TGGTTACGCT 

8 01 GGACGCGTTT TGCCGTTACG GTCTGATTGA TGCCGCCCAT CCGCTGTCCG 
851 AGTGGCTGGA CGGCATACGT TTATAG 

This encodes a protein having amino acid sequence <SEQ ID 41 4>: 
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1 MPTVRFTESV SKHDLDALFE WAKASYGAES CWKTLYLNGL PLGNLSPEWA 

51 ERVKKDWEAG CSESSDGIFL NADGWPDMGR RLQHLARIWK EAGLLHGWRD 

101 ECFDLTDGGS NPLFALERAA FRPFGLLSRA VHLNGLVESD GRWHFWIGRR 

151 SPHKAVDPDK LDNTAAGGVS SGELPSETVC RESSEEAGLD KTLLPLIRPV 

201 SQLHSLRPVS RGVHNEILYV FDAVLPETFL PENQDGEVAG FEKMDIGGLL 

251 AAMLSGNMMH DAQLVTLDAF CRYGLIDAAH PLSEWLDGIR L* 

ORF105a and ORF105-1 show 93.8% identity in 291 aa overlap: 

10 20 30 40 50 60 

orf 105a . pep MPTVRFTESVSKHDLDALFEWAKASYGAESCWKTLYLNGLPLGNLSPEWAERVKKDWEAG 

orf 105-1 MPTVRFTESVSKQDLDALFEWAKASYGAESCWKTLYLNGLPLGNLSPEWVERVKKDWEAG 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf 105a. pep CSESSDGIFLNADGWPDMGRRLQHLARIWKEAGLLHGWRDECFDLTDGGSNPLFALERAA 
I I I I I I I I I I I I I I I I I I I MINI I : MM M I : I M M M M M M I : M M I 
orf 105-1 CSESSDGIFLNADGWPDMGGRLQHLALGWHCAGLLDGWRNECFDLTDGGGNPLFTLERAA 
70 80 90 100 110 120 



130 140 150 160 170 180 

orf 105a. pep FRPFGLLSRAVHLNGLVESDGRWHFWIGRRSPHKAVDPDKLDNTAAGGVS SGELPSETVC 
M M M M M M M M M M I M M M I M M M M M : M I M M M M M I : M I M I 
orf 105-1 FRPFGLLSRAVHLNGLTESDGRWHFWIGRRSPHKAVDPNKLDNTAAGGVSGGEMPSEAVC 
130 140 150 160 170 180 



190 200 210 220 230 240 

orf 105a. pep RESSEEAGLDKTLLPLIRPVSQLHSLRPVSRGVHNEILYVFDAVLPETFLPENQDGEVAG 
M M M M M I M M M ! M M I M M M I M M M M M M M M M M M M M M I 
orf 105-1 RESSEEAGLDKTLLPLIRPVSQLHSLRSVSRGVHNEILYVFDAVLPETFLPENQDGEVAG 

190 200 210 220 230 240 



250 260 270 280 290 

orf 105a. pep FEKMDIGGLLAAMLSGNMMHDAQLVTLDAFCRYGLIDAAHPLSEWLDGIRLX 
I M M M M I M M M I M M M M M I [ I M M M M M M M M M M I 
orf 105-1 FEKMDIGGLLDAMLSGNMMHDAQLVTLDAFCRYGLIDAAHPLSEWLDGIRLX 

250 260 270 280 290 



Homology with a predicted ORF from N. gonorrhoeae 

ORF 105 shows 87.5% identity over a 3 12aa overlap with a predicted ORF (ORF105.ng) from N. 
gonorrhoeae: 



orf 105 . pep MVARRAHNPKWGSNPXPATXFQT PRFNAEXVLXLPVSCFLFPAASVFCRIFLPAAISER 60 

M M M M M M M M Ml M M M M I I I M M M M M M M M M M I 

orfl05ng MVARRAHNPKWGSNPAPATKYQTPRFNAEGVLF FLFPAASVFCRIFLPAAISER 55 

orf 105 .pep QTAVCLRLQIQAVWLQSSALSSRKPTMPTVRFTESVSKQDLDALFEWAKASYGAESCWKT 120 

I : I I I I I I I! I M M M M I M M M 1 M M M M M M M M M M M M M M M I 

orf 105ng QAAVCLRLQIQAVWLQSSALCSRKPAMPTVRFTESVSKQDLDALFERAKASYGAESCWKT 115 

orf 105 .pep LYLNGXPLGNLSPEWVERVXKDWEAGCXESSDGIFLNADGWPDMGGRLQHLALGWHCAGL 180 

MM M M M M I M I : M M M I M I M M M I M M M M M M M \ ■ Mi 

orfl05ng LYLNRLPLGNLSPEWAERIKKDWEAGCSESSNGIFLNADGWPDMGGRLQHLARTWNKAGL 175 

orf 105 .pep LDGWRNECFDLTDGGGNPLFTLERAXXRPXGLLSRAVHLNGLTESDGRWHFWIGRRSPHK 240 
I M M M M M M M M M M M I M Ml M M M M M M M M M M M M M 

orf 105ng LHGWRNECFDLTDGGGNPLFTLERAAFRPFGLLIRAVHLNGLVESNGRWHFW1GRRSPHK 235 

orf 105 .pep AVDPNKLDNTXAGGVSGGEMPSEAVCRESSEEAGLDKTLLPLIRPVSQLHSLRSVSRGVH 300 

M M : M M : M M M M M M I M M M M M M M M M M M M M M M M M 

orf lOSng AVDPGKLDNIAGGGVSGGEMPSEAVCRESSEEAGLDKTLFPLIRPVSRLHSLRPVSRGVH 2 95 

orf 105. pep NEILYVFDAVLP 312 

M M M M M M 

orf 105ng NEILYVFDAVLPETFLPENQDGEVAGFEKMDIGGLLDAMLSKNMMHDAQLVTLDAFYRYG 355 
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A complete length ORF105ng nucleotide sequence <SEQ ID 415> was predicted to encode a 
protein having amino acid sequence <SEQ ID 416>: 

1 MVARRAHNPK VVGSNPAPAT KYQTPRFNAE G VLFFLFPAA SVFCRIFL PA 

51 AISERQAAVC LRLQIQAVWL QSSALCSRKP AMPTVRFTES VSKQDLDALF 

101 ERAKASYGAE SCWKTLYLNR LPLGNLSPEW AERIKKDWEA GCSESSNGIF 

151 LNADGWPDMG GRLQHLARTW NBCAGLLHGWR NECFDLTDGG GNPLFTLERA 

201 AFRPFGLLIR AVHLNGLVES NGRWHFWIGR RSPHKAVDPG KLDNIAGGGV 

251 SGGEMPSEAV CRESSEEAGL DKTLFPLIRP VSRLHSLRPV SRGVHNEILY 

301 VFDAVLPETF LPENQDGEVA GFEKMDIGGL LDAMLSKNMM HDAQLVTLDA 

351 FYRYGLI DAA HPLSEWLDGI RL* 

Further work revealed the complete nucleotide sequence <SEQ ID 417>: 

1 ATGCCGACCG TCCGTTTTAC CGAATCCGTC AGCAAACAAG ACCTTGATGC 

51 CCTGTTCGAG CGGGCAAAAG CAAGTTACGG TGCCGAAAGT TGCTGGAAAA 

101 CGCTGTATCT GAACCGTCTT CCTTTGGGCA ATCTGTCGCC GGAATGGGCT 

151 GAGCGCATCA AAAAAGACTG GGAGGCAGGC TGCTCCGAGT CTTCAGACGG 

201 CATTTTTCTG AATGCGGACG GCTGGCCGGA TATGGGCGGA CGCTTGCAGC 

251 ACCTCGCCCG CACATGGAAC AAGGCGGGGC TGCTTCACGG ATGGCGCAAC 

301 GAGTGTTTCG ACCTGACCGA CGGCGGCGGC AACCCCTTGT TCACGCTCGA 

351 ACGCGCCGCT TTCCGTCCGT TCGGACTACT CAGCCGCGCC GTCCATCTCA 

401 ACGGTTTGGT CGAATCGAAC GGCAGATGGC ATTTTTGGAT AGGCAGGCGC 

4 51 AGTCCGCACA AAGCAGTCGa tcCCGGCAAG CTCGACAATA TTGCCGGCGG 

501 CGGTGTTTCC GGCGGCGAAA TGCCGTCTGA AGCCGTGTGC CGCGAAAGCA 

551 GCGAAGAAGC CGGTTTGGAT AAAACGCTGT TTCCGCTCAT CCGCCCAGTA 

601 TCGCGGCTGC ACAGCCTTCG CCCCGTCAGC CGAGGTGTGC ACAATGAAAT 

651 CCTGTATGTG TTCGATGCCG TCCTGCCCGA AACCTTCCTG CCTGAAAATC 

701 AGGATGGCGA GGTAGCGGGT TTTGAAAAGA TGGACATTGG CGGCCTATTG 

751 GATGCCATGT TGTCGAAAAA CAT GATGCAC GACGCGCAAC TGGTTACGCT 

801 GGACGCGTTT TACCGTTACG GTCTGATTGA TGCCGCCCAT CCGCTGTCCG 

851 AGTGGCTGGA CGGCATACGT T TAT AG 

This corresponds to the amino acid sequence <SEQ ID 418; ORF105ng-l>: 

1 MPTVRFTESV SKQDLDALFE RAKASYGAES CWKTLYLNRL PLGNLSPEWA 

51 ERIKKDWEAG CSESSDGIFL NADGWPDMGG RLQHLARTWN KAGLLHGWRN 

101 ECFDLTDGGG NPLFTLERAA FRPFGLLSRA VHLNGLVESN GRWHFWIGRR 

151 SPHKAVDPGK LDNIAGGGVS GGEMPSEAVC RESSEEAGLD KTLFPLIRPV 

201 SRLHSLRPVS RGVHNEILYV FDAVLPETFL PENQDGEVAG FEKMDIGGLL 

251 DAMLSKNMMH DAQLVTLDAF YRYGLIDAAH PLSEWLDGIR L* 

ORG105ng-l and ORF105-1 show 93.5% identity in 291 aa overlap: 

10 20 30 40 50 60 

MPTVRFTESVSKQDLDALFEWAKASYGAESCWKTLYLNGLPLGNLSPEWVERVKKDWEAG 
I I I I I I I I I I I I I i I I I I I ! I I I I M [ I I I I I I I I I I I I I I I I I I I I : I I : I I I I II I 
MPTVRFTESVSKQDLDALFERAKASYGAESCWKTLYLNRLPLGNLSPEWAERIKKDWEAG 

10 20 30 40 50 60 

70 80 90 100 110 120 

CSESSDGIFLNADGWPDMGGRLQHLALGWHCAGLLDGWRNECFDLTDGGGNPLFTLERAA 
I I I II I I I I I I I I I I II I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
CSESSDGIFLNADGWPDMGGRLQHLARTWNKAGLLHGWRNECFDLTDGGGNPLFTLERAA 
70 80 90 100 110 120 

130 140 150 160 170 180 

FRPFGLLSRAVHLNGLTESDGRWHFWIGRRSPHKAVDPNKLDNTAAGGVSGGEMPSEAVC 
I I I I I I I I I I I I I I I I : I I : I I I I I I I I I II I II I I I I : I I I I I : i I M I I I I I I I I I I 
FRPFGLLSRAVHLNGLVESNGRWHFWIGRRSPHKAVDPGKLDNIAGGGVSGGEMPSEAVC 
130 140 150 160 170 180 

190 200 210 220 230 240 

RES SEEAGLDKTLL PL IRPVSQLHS LRS VSRGVHNE I LYVFDAVL PETFLPENQDGEVAG 
I I I I I I I I I I M I : I I I I I ! I : I I I ! I I I I I I I I I I I I I I I I I I I I I I I I M I I I M M 

RES SEEAGLDKTLFPLIRPVSRLHSLRPVSRGVHNE I LYVFDAVL PETFLPENQDGEVAG 
190 200 210 220 230 240 



orf 105-1 .pep 
orfl05ng-l 

orf 105-1 .pep 
orf 105ng-l 

orf 105-1 .pep 
orfl05ng-l 

orf 105-1 .pep 
orf 105ng-l 
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250 260 270 280 290 

orf 105-1 .pep FEKMD I GGLLDAMLSGNMMHDAQLVTLDAFCRYGL I DAAHPLSEWLDGIRLX 
I II I I I I I I I I I M I I I M I I I I M M I I I M II II I I I I I I I I I I 1 I I I 
orfl05ng-l FEKMDIGGLLDAMLSKNMMHDAQLVTLDAFYRYGLI DAAHPLSEWLDGIRLX 

5 250 260 - 270 280 290 

Furthermore, ORF105ng-l shows homology with a yeast enzyme: 

sp|P41888 |TNR3_SCHPO THIAMIN PYROPHOSPHOKINASE (TPK) (THIAMIN KINASE) 
>gi 1 1076928 Ipir | I S52350 thiamin pyrophosphokinase (EC 2.7.6.2) - fission yeast 
(Schizosaccharomyces pombe) >gi I 666111 (X84417) thiamin pyrophosphokinase 
10 [Schizosaccharomyces pombe] >gi | 2330852 | gnl | PID | e334 056 (Z98533) thiamin 

pyrophosphokinase [Schizosaccharomyces pombe] Length = 569 
Score = 105 bits (259), Expect = 4e-22 

Identities = 64/192 (33%), Positives = 94/192 (48%), Gaps = 3/192 (1%) 

15 Query: 2 68 NKAGLLHGWRNECFDLTDGGGNPLFTLERAAFRPFGLLSRAVHLNGLVESNGRW— HFWI 441 

N G+ WRNE + + P+ +ER F FG LS VH + + W+ 

Sbjct: 96 NTFGIADQWRNELYTVYGKSKKPVLAVERGGFWLFGFLSTGVHCTMYIPATKEHPLRIWV 155 

Query: 442 GRRSPHKAVDPGKLDNIAGGGVSGGEMPSEAVCRESSEEAGLDKTLFPLIRPVSRLHSLR 621 
20 RRSP K P LDN GG++ G+ + +E SEEA LD + LI P + ++ 

Sbjct: 156 PRRSPTKQTWPNYLDNSVAGGIAHGDSVIGTMIKEFSEEANLDVSSMNLI-PCGTVSYIK 214 

Query: 622 PVSRG-VHNEILYVFDAVLPETFLPENQDGEVAGFEKMDIGGLLDAMLSKNMMHDAQLVT 7 98 
R + E+ XVFD + + +P DGEVAGF + + +L + K+ + LV 
25 Sbjct: 215 MEKRHWIQPELQYVFDLPVDDLVIPRINDGEVAGFSLLPLNQVLHELELKSFKPNCALVL 274 

Query: 799 LDAFYRYGLIDAAHP 843 

LD R+G+I HP 
Sbjct: 275 LDFLIRHGIITPQHP 289 

30 Based on this analysis, including the presence of a putative transmembrane domain in the 
gonococcal protein, it is predicted that the proteins from N. meningitidis andiV. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 49 

The following DNA sequence, believed to be complete, was identified in JV. meningitidis <SEQ ID 
35 419>: 

1 AT GAAT AGAC CCAAGCAACC CTTCTTCCGT CCCGAAGTCG CCGTTGCCCG 

51 CCAAACCAGC CTGACGGGTA AAGTGATTCT GACACGACCG TTGTCATTTT 

101 CCCTATGGAC GACATTTGCA TCGATATCTG CGTTATTGAT TATCCTGTTT 

151 TTGATATTTG GTAACTATAC GCGAAAGACA ACAGTGGAGG GACAAATTTT 

40 2 01 ACCTGCATCG GGCGTAATCA GGGTGTATGC ACCGgATACG rGkACAATTA 

251 CAGCGAAATT CGTGGAAGAT GGmsAAAAGG TTAAGGCTGG CGACAAGCTA 

3 01 TTTGCGCTTT CGACCTCACG TTTCGGCGCA GGAGGTAGCG TGCAGCAGCA 
351 GTTGAAAACG GAGGCAGTTT TGAAGAAAAC GTTGGCAGAA CAGGAACTGG 

4 01 GTCGTCTGAA GCTGATACAC GGGAATGAAA CGCGCAgCcT TAAAGCAACT 
45 4 51 GTCGAACGTT TGGAAAACCA GGAACTCCAT ATTTCGCAAC AGATAGACGG 

501 TCAGAAAAGG CGCATTAGAC TTGCGGAAGA AATGTTGCAG AAATATCGTT 

551 TCCTATCCGC . CAATGA 

This corresponds to the amino acid sequence <SEQ ID 420; ORF107>: 

1 MNRPKQPFFR PEVAVARQTS LTGKVILTRP LSFSLWTTFA SISALLIILF 

50 51 LIFGNYTRKT TVEGQILPAS GVIRVYAPDT XTITAKFVED GXKVKAGDKL 

101 FALSTSRFGA GGSVQQQLKT EAVLKKTLAE QELGRLKLIH GNETRSLKAT 

151 VERLENQELH ISQQIDGQKR RIRLAEEMLQ KYRFLSXQ* 

Computer analysis of this amino acid sequence gave the following results: 
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Homology with a predicted QRF from A T . meningitidis (strain A) 

ORF107 shows 97.8% identity over a 186aa overlap with an ORF (ORF107a) from strain A of TV. 
meningitidis: 

10 20 30 40 50 60 

orfl07 pep MNRPKQPFFRPEVAVARQTSLTGKVILTRPLSFSLWTTFASISALLIILFLIFGNYTRKT 
| | | | | | I I I I I I I I I M I I I I i I I I I I I I I I I I I I I I I I 1 I I I I 1 I I i M I i I I I I I I I I 
orfl07a MNRPKQPFFRPEVAVARQTSLTGKVILTRPLSFSLWTTFASISALLIILFLIFGNYTRKT 



10 



20 



30 



40 



50 



60 



70 80 90 100 110 120 

TVEGQILPASGVIRVYAPDTXTITAKFVEDGXKVKAGDKLFALSTSRFGAGGSVQQQLKT 
I | | I I I I I I i I I I I I I I I 1 I I I I I I I III I II I I I I I I I I I I II I I I I MINIM 
TVEGQILPASGVIRVYAPDTGTITAKFXEDGEKVKAGDKLFALSTSRFGAGDSVQQQLKT 

70 80 90 100 110 120 

130 140 150 160 170 180 

EAVLKKTLAEQELGRLKLIHGNETRSLKATVERLENQELHISQQIDGQKRRIRLAEEMLQ 

[ I [ M M M I M M I M II I I II II I I I II II II I I II I I 

EAVLKKTLAEQELGRLKLIHGNETRSLKATVERLENQELHISQQIDGQKRRIRLAEEMLQ 

130 140 150 160 170 180 



orfl07 .pep 
orfl07a 



The complete length ORF 107a nucleotide sequence <SEQ ID 421 > is: 



101 

151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



ATGAATAGAC 
CCAAACCAGC 
CCCTATGGAC 
TTGATATTTG 
ACCTGCATCG 
CNGCGAAATT 
TTTGCGCTTT 
GTTGAAAACG 
GTCGTCTGAA 
GTCGAACGTT 
T C AG AAAAGG 
TCCTATCCGC 
GCAGAGCTTT 
AGTCGGGCTG 
TCCCCCAAGC 



CCAAGCAACC 
CTGACGGGTA 
GACATTTGCA 
GTAACTATAC 
GGCGTAATCA 
CNTGGAAGAT 
CGACCTCACG 
GAGGCAGTTT 
GCTGATACAC 
TGGAAAACCA 
CGCATTAGAC 
CAATGATGCA 
TAGAGCAGAA 
CTTCAGGAAA 
GGCATGA 



NTTCTTCCGT 
AAGTGATTCT 
TCGATATCTG 
GCGAAAGACA 
GGGTGTATGC 
GGAGAAAAGG 
TTTCGGCGCA 
TGAAGAAAAC 
GGGAATGAAA 
GGAACTCCAT 
TTGCGGAAGA 
GTGCCAAAAC 
AGCCAAACTT 
TCCGCACGCA 



CCCGAAGTCG 
GACACGACCG 
CGTTATTGAT 
ACAGTGGAGG 
ACCGGATACG 
TTAAGGCTGG 
GGAGATAGCG 
GTTGGCAGAA 
CGCGCAGCCT 
ATTTCGCAAC 
AATGTTGCAG 
AAGAAATGAT 
GATGCCTACC 
GAATCTGACA 



CCGTTGCCCG 
TTGTCATTTT 
TATCCTGTTT 
GACAAATTTT 
GGGACAATTA 
CGACAAGCTA 
TGCAGCAGCA 
CAGGAACTGG 
TAAAGCAACT 
AGATAGACGG 
AAATATCGTT 
GAATGTCAAG 
GCCGAGAAGA 
TTGGNNAGCC 



This encodes a protein having amino acid sequence <SEQ ID 422>: 

1 MNRPKQPFFR PEVAVARQTS LTGKVILTRP LSFSLWT TFA SISALLIILF 

45 51 LIFGNYTRKT TVEGQILPAS GVIRVYAPDT GTITAKFXED GEKVKAGDKL 

101 FALSTSRFGA GDSVQQQLKT EAVLKKTLAE QELGRLKLIH GNETRSLKAT 

151 VERLENQELH ISQQIDGQKR RIRLAEEMLQ KYRFLSANDA VPKQEMMNVK 

2 01 AELLEQKAKL DAYRREEVGL LQEIRTQNLT LXSLPQAA* 

50 Homology with a predicted ORF from N. gonorrhoeae 

ORF107 shows 95.7% identity over a 188aa overlap with a predicted ORF (ORF107.ng) from N. 
gonorrhoeae: 



55 



orf 107 . pep MNRPKQPFFRPEVAVARQTSLTGKVILTRPLSFSLWTTFASISALLIILFLIFGNYTRKT 
I I I I I I I I I I I I I I : I I I II II I I I I II II I I I I I II I I I I I I I I II I I II II II II II I 
orfl07ng MNRPKQPFFRPEVAIARQTSLTGECVILTRPLSFSLWTTFASISALLIILFLIFGNYTRKT 



orf 107 .pep TVEGQILPASGVIRVYAPDTXTITAKFVEDGXKVKAGDKLFALSTSRFGAGGSVQQQLKT 120 
I : I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I II I I II I I I M I I I I I I 
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orfl07ng TMEGQILPASGVIRVYAPDTGTITAKFVEDGEKVKAGDKLFALSTSRFGAGGSVQQQLKT 120 

orfl07 pep EAVLKKTLAEQELGRLKLIHGNETRSLKATVERLENQELHISQQIDGQKRRIRLAEEMLQ 180 

| | | | | | | [ | [ | I I I M I I I I I M I I I I I I I I I I I I I : M I M I I I I I I I I 1 I I I I I I I : 
orfl07ng EAVLKKTLAEQELGRLKLIHENETRSLKATVERLENQKLHISQQIDGQKRRIRLAEEMLR 18 0 

orfl07.pep KYRFLSXQ 188 

MINI I 
orfl07ng KYRFLSAQ 188 

The complete length ORF107ng nucleotide sequence <SEQ ID 423> is predicted to encode a 
protein having amino acid sequence <SEQ ID 424>: 

1 MNRPKQPFFR PEVAIARQTS LTGKVILTRP LSFSLWT TFA SISALLIILF 

51 LIFGN YTRKT TMEGQILPAS GVIRVYAPDT GTITAKFVED GEKVKAGDKL 

101 FALSTSRFGA GGSVQQQLKT EAVLKKTLAE QELGRLKLIH ENETRSLKAT 

151 VERLENQKLH ISQQIDGQKR RIRLAEEMLR KYRFLSAQ* 

Based on the presence of a putative ransmembrane domain in the gonococcal protein, it is predicted 
that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful 
antigens for vaccines or diagnostics, or for raising antibodies. 



Example 50 

The following DNA sequence, believed to be complete, was identified in TV*. meningitidis <SEQ ID 
425>: 



1 ATGCTGAATA CTTTTTTTGC CGTATTGGGC GGCTGCCTGC TGCT . TTGCC 

51 GTGCGGCAAA TCCGTAAATA CGGCGGTACA GCCGCAAAAC GCGGTACAAA 

101 GCGCGCCGAA ACCGGTTTTC AAAGTCATAT ATATCGACAA TACGGCGATT 

151 GCCGGTTTGG ATTTGGGACA AAGCAGCGAA GGCAAAACCA ACGACGGCAA 

201 AAAACAAATC AGTTATCCGA TTAAAGGCTT GCCGGAACAA AATGTTATCC 

251 GACTGATCGG CAAGCATCCC GGCGACTTGG AAGCCGTCAG CGGCAAATGT 

301 AT GGAAACCG ATGATAAGGA CAGTCCGGCA GGTTGGGCAG AAAACGGCGT 

351 GTGCCATACC TTGTTTGCCA AACTGGTGGG CAATATCGCC GAAGACGGCG 

4 01 GCAAACTGAC GGATTACCTA GTTTCGCATG CCGCCCTGCA ACCCTATCAG 

4 51 GCAGGCAAAA GCGGCTATGC CGCCGTGCAG AACGGACGCT ATGTGCTGGA 

501 AATCGACAGC GAAGGGGCGT TTTATTTCCG CCGCCGCCAT TATTGA 

This corresponds to the amino acid sequence <SEQ ID 426; ORF108>: 

1 MLNTFFAVLG GCLLXLPCGK SVNTAVQPQN AVQSAPKPVF KVIYIDNTAI 

51 AGLDLGQSSE GKTNDGKKQI SYPIKGLPEQ NVIRLIGKHP GDLEAVSGKC 

101 METDDKDSPA GWAENGVCHT L FAKLVGN I A EDGGKLTDYL VSHAALQPYQ 

151 AGKSGYAAVQ NGRYVLEIDS EGAFYFRRRH Y* 

Further work revealed the following DNA sequence <SEQ ID 427>: 



1 ATGCTGAAAA CATCTTTTGC CGTATTGGGC GGCTGCCTGC TGCTTGCCGC 

51 CTGCGGCAAA TCCGAAAATA CGGCGGAACA GCCGCAAAAC GCGGTACAAA 

101 GCGCGCCGAA ACCGGTTTTC AAAGTCAAAT ATATCGACAA TACGGCGATT 

151 GCCGGTTTGG ATTTGGGACA AAGCAGCGAA GGCAAAACCA ACGACGGCAA 

2 01 AAAACAAATC AGTTATCCGA TTAAAGGCTT GCCGGAACAA AATGTTATCC 

251 GACTGATCGG CAAGCATCCC GGCGACTTGG AAGCCGTCAG CGGCAAATGT 

301 ATGGAAACCG ATGATAAGGA CAGTCCGGCA GGTTGGGCAG AAAACGGCGT 

351 GTGCCATACC TTGTTTGCCA AACTGGTGGG CAATATCGCC GAAGACGGCG 

4 01 GCAAACTGAC GGATTACCTA GTTTCGCATG CCGCCCTGCA ACCCTATCAG 

451 GCAGGCAAAA GCGGCTATGC CGCCGTGCAG AACGGACGCT ATGTGCTGGA 

501 AATCGACAGC GAAGGGGCGT TTTATTTCCG CCGCCGCCAT TATTGA 

This corresponds to the amino acid sequence <SEQ ID 428; ORF108-1>: 
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1 MLKTSFAVLG GCLLLAA CGK SENTAEQPQN AVQSAPKPVF KVKYIDNTAI 

51 AGLDLGQSSE GKTNDGKKQI SYPIKGLPEQ NVIRLIGKHP GDLEAVSGKC 

101 METDDKDSPA GWAENGVCHT L FAKLVGN I A EDGGKLTDYL VSHAALQPYQ 

151 AGKSGYAAVQ NGRYVLEIDS EGAFYFRRRH Y* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. gonorrhoeae 

ORF108 shows 88.4% identity over a 181aa overlap with a predicted ORF (ORF108.ng) from N. 



orfl08 pep MLNTFFAVLGGCLLXLPCGKSVNTAVQPQNAVQSAPKPVFKVIYIDNTAI AGLDLGQSSE 60 

||: | I I I I I I I I I I II III M I I I : II II I I I I I I I I I I I I I I I I I I I I I I 
orfl08ng MLKIPFAVLGGCLLLAACGKSENTAEQPQNAAQSAPKPVFKVKYIDNTAIAGLALGQSSE 60 

orfl08 .pep GKTNDGKKQI SYPIKGLPEQNVIRLIGKHPGDLEAVSGKCMETDDKDSPAGWAENGVCHT 120 

I I I I I I II I I I I I I II I I I II : : I I I I I I : I II I I I I II I I I I I : I : I I I I I I I II I 
orf 108ng GKTNDGKKQI SYPIKGLPEQNAVRLTGKHPNDLEAWGKCMETDGKDAPS GWAENGVCHT 120 

orfl08 .pep LFAKLVGNIAEDGGKLTDYLVSHAALQPYQAGKSGYAAVQNGRYVLEIDSEGAFYFRRRHY 181 

I I I I I I I I I I I I I I I I I I I! : I I : II I I I I II I I I I II I I I I I I I I II I I I I I M I I I I I I 
orfl08ng LFAKLVGNIAEDGGKLTDYLISHSALQPYQAGKSGYAAVQNGRYVLEIDSEGAFYFRRRHY 181 

ORF108-1 shows 92.3% identity with ORF108ng over the same 181 aa overlap: 

MLKTSFAVLGGCLLLAACGKSENTAEQPQNAVQSAPKPVFKVKYIDNTAI AGLDLGQSSE 60 

III I I I I I II I I I I I I I I I I I I I I I I I I I : I M I I I I I I I I I 1 I I I I I I I I I I I I I I 

MLKIPFAVLGGCLLLAACGKSENTAEQPQNAAQSAPKPVFKVKYIDNTAIAGLALGQSSE 60 

GKTNDGKKQI SYPIKGLPEQNVIRLIGKHPGDLEAVSGKCMETDDKDSPAGWAENGVCHT 120 
I I I I I I I I I I I I I I II I I I I I : : I I 1111:11111 I I I I I I I I I : I : I I II I I I I I I 
GKTNDGKKQI SYPIKGLPEQNAVRLTGKHPNDLEAWGKCMETDGKDAPSGWAENGVCHT 120 

LFAKLVGNIAEDGGKLTDYLVSHAALQPYQAGKSGYAAVQNGRYVLEIDSEGAFYFRRRHY 181 

LFAKLVGNIAEDGGKLTDYLISHSALQPYQAGKSGYAAVQNGRYVLEIDSEGAFYFRRRHY 181 

The complete length ORF108ng nucleotide sequence <SEQ ID 429> is: 

1 ATGCTGAAAa tacctTTTGC CGTGTtgggc ggCtgcctGC TGCTTGCCGC 

35 51 CTGCGGCAAA TCCGAAAATa cggcggaACA GCCGCAAAAT gcggCACAAA 

101 GCGCGCCGAA ACCGGTTTTC AAAGTCAAAT ACATCGACAA TACGGCGATT 

151 GCCGGTTTGG CTTTGGGACA AAGTAGCGAA GGCAAAACCA acgacgGCAA 

2 01 AAAACAAATC AGTTATccgA TTAAAGGCTT GCCGGAACAA AacgccgtCC 

251 gGCTGACCGG AAAGCATCCC AACGACTTGG AagccgtcgT CGGCAAATGT 

40 301 ATGGAAACCG ACGGAAAGGA CGCGCCTTCG GGCTGGGCGG AAAACGGCGT 

351 GTGCCATACC TTGTTTGCCA AACTGGTGGG CAATATCGCC GAAGACGGCG 

4 01 GCAAACTGAC TGATTACCTG ATTTCGCATT CCGCCCTGCA ACCCTATCAG 

4 51 GCAGGCAAAA GCGGCT AT GC CGCCGTGCAG AACGGACGCT ATGTGCTGGA 

501 AAT CGACAGC GagggGGCGT TTTATttccg ccgccgccat tattgA 

45 This encodes a protein having amino acid sequence <SEQ ID 43 0>: 

1 MLKI PF AVLG GCLLLAAC GK SENTAEQPQN AAQSAPKPVF KVKYIDNTAI 

51 AGLAL GOSSE GKT NDGKKQI SYPIKGLPEQ NAVRLTGKHP NDLEAWGKC 

101 METDGKDAPS GWAENGVCHT LFAKLVGNIA EDGGKLTDYL ISHSALQPYQ 

151 AGKSGYAAVQ NGRYVLEIDS EGAFYFRRRH Y* 

50 Based on this analysis, including the presence of a predicted prokaryotic membrane lipoprotein 
lipid attachment site (underlined) and a putative ATP/GTP -binding site motif A (P-loop, double- 
underlined) in the gonococcal protein, it is predicted that the proteins from N. meningitidis and 
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N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



Example 51 

The following DNA sequence was identified in N. meningitidis <SEQ ID 43 1>: 

1 ATGGAAGATT TATATATAAT ACTCGCTTTG GGTTTGGTTG CGATGATTGC 

51 CGgATTTATC GATgcgatTg cGggCGGGGG TGGTTTGATT ACGCTGCCCG 

101 CACTCTTGTT GGCAGGTATT CCTCCCGTGT CGGCAATTGC CACCAACAAG 

151 CTGCAAgCAG CCGCTGCTAC GTTTTCAGCT ACGGTTTCTT TTGCACGCAA 

201 AGGTTTGATT GAT T GGAAGA AAGGTCTCCC GATTGCCGCA GCATCGTTTG 

251 TAGGCGGCGT GGcCGGTGCA TTATCGGTCA GCTTGGTTTC CAAAGATATT 

301 CTgCTgGCGG TCGTGCCGGT TTTGTTGATA TTTGTCGCAC TGTATTTTGT 

351 GTTTTCGCCC AAGCTCGACG GCAGTAAGGA AGGCAAAGCC AGAATGTCTT 

4 01 TTTTTCTGTT cGGGCTGACG GTCGC.ACCG CTTTTGGGTT TTTACGACGG 

451 TGTGTTCGGA CCGGGTGTCG GCTCGTTTTT TCTGATTGCC TTTATTGTTT 

501 TGCTCGGCTG CAAgCTGTTG AACGCGATGT CTTACACCAA ATTGGCGAAC 

551 GTTGCCTGCA ATCTTGGTTC GCTATCGGTA TTCCTGCTGC ACGGTTCGAT 

601 TATTTTCCCG ATTGCGGCAA CGaTGGCGGT CGGTGCGTTT GTCGGtGCGA 

651 ATTTAgGTGC GAGATTTGCC GTaCgctTCG GTTCGAAGCT GATTAA 

This corresponds to the amino acid sequence <SEQ ID 432; ORF109>: 



1 MEDLYIILAL GLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPVSAIATNK 

51 LQAAAATFSA TVSFARKGLI DWKKGLPIAA AS FVGGVAGA LSVSLVSKDI 

101 LLAWPVLLI FVALYFVFSP KLDGSKEGKA RMSFFLFGLT VXTAFGFLRR 

151 CVRTGCRLVF SDCLYCFARL QAVERDVLHQ IGERCLQSWF AIGIPAARFD 

201 YFPDCGNDGG RCVCRCEFRC EICRTLRFEA D* 

Further work revealed the following DNA sequence <SEQ ID 433>: 



1 ATGGAAGATT TATATATAAT ACTCGCTTTG GGTTTGGTTG CGATGATTGC 

51 CGGATTTATC GATGCGATTG CGGGCGGGGG TGGTTTGATT ACGCTGCCCG 

101 CACTCTTGTT GGCAGGTATT CCTCCCGTGT CGGCAATTGC CACCAACAAG 

151 CTGCAAGCAG CCGCTGCTAC GTTTTCAGCT ACGGTTTCTT TTGCACGCAA 

201 AGGTTTGATT GATTGGAAGA AAGGTCTCCC GATTGCCGCA GCATCGTTTG 

251 TAGGCGGCGT GGCCGGTGCA TTATCGGTCA GCTTGGTTTC CAAAGATATT 

301 CTGCTGGCGG TCGTGCCGGT TTTGTTGATA TTTGTCGCAC TGTATTTTGT 

351 GTTTTCGCCC AAGCTCGACG GCAGTAAGGA AGGCAAAGCC AGAATGTCTT 

4 01 TTTTTCTGTT CGGGCTGACG GTCGCACCGC TTTTGGGTTT TTACGACGGT 

451 GTGTTCGGAC CGGGTGTCGG CTCGTTTTTT CTGATTGCCT TTATTGTTTT 

501 GCTCGGCTGC AAGCTGTTGA ACGCGATGTC TTACACCAAA TTGGCGAACG 

551 TTGCCTGCAA TCTTGGTTCG CTATCGGTAT TCCTGCTGCA CGGTTCGATT 

601 ATTTTCCCGA TTGCGGCAAC GATGGCGGTC GGTGCGTTTG TCGGTGCGAA 

651 TTTAGGTGCG AGATTTGCCG TCCGCTTCGG TTCGAAGCTG ATTAAGCCGC 

701 TGCTGATTGT CATCAGCATT TCGATGGCTG TGAAATTGTT GATAGACGAG 

751 AGAAATCCGC T G TAT C AG AT GATTGTTTCG ATGTTTTAA 

This corresponds to the amino acid sequence <SEQ ID 434; ORF109-1>: 



1 MEDLYIILAL GLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPVSAIA TNK 

51 LQAAAATFSA TVSFARKGLI DWKKGLPIA A AS FVGGVAGA LSVSLV SKDI 

101 LLAWPVLLI FVALYF VFSP KLDGSKEGKA R MSFFLFGLT VAPLLGFY DG 

151 VFGPG VGSFF LIAFIVLLGC KL LNAMSYTK LANVACNLGS LSVFLLHGSI 

201 IFPIAATMAV GAFVGA NLGA RFAVRFGSKL I KPLLIVISI SMAVKLLID E 

251 RNPLYQMIVS MF* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted QRF from N. meningitidis (strain A) 

ORF109 shows 95.9% identity over a 147aa overlap with an ORF (ORF109a) from strain A of N. 
meningitidis: 
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orf 10 9 pep MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 
| | | | M | ! | [ I I I I I I I I I I I I I I I I I M I I I I I I I I I I I 1 1 1 1 I I I I M II I I I I I I I I 
orf 10 9a MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 
5 10 20 30 40 50 60 

70 80 90 100 110 120 

orf 10 9 pep TVSFARKGLIDWKKGLPIAAAS FVGGVAGALSVSLVSKDILLAWPVLLIFVALYFVFSP 
[ | | | I | II I I I I I I I I I i I I I I I : I I I : I II I I I I I 1 M I I I I I I I I I I I I II I i I I I I I 
10 orfl09a TVSFARKGLIDWKKGLPIAAASFAGGVVGALSVSLVSKDILLAVVPVLLIFVALYFVFSP 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 10 9. pep KLDGSKEGKARMSFFLFGLTVXTAFGFLRRCVRTGCRLVFSDCLYCFARLQAVERDVLHQ 

15 I I I I II II II I I I I I I M I I I : I I 

orf 109a KLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFFLIAFIVLLGCKLLNAMSYTK 
130 140 150 160 170 180 

The complete length ORF109a nucleotide sequence <SEQ ID 435> is: 

1 ATGGAAGATT TATACATAAT ACTCGCTTTG GGTTTGGTTG CGATGATTGC 

20 51 CGGATTTATC GATGCGATTG CGGGTGGGGG TGGTTTGATT ACGCTGCCTG 

101 CACTCTTGTT GGCAGGTATT CCTCCCGTGT CGGCAATTGC CACCAACAAG 

151 CTGCAAGCAG CCGCTGCTAC GTTTTCGGCT ACGGTTTCTT TTGCACGCAA 

201 AGGTTTGATT GATT GGAAGA AAGGTCTCCC GATTGCGGCA GCATCGTTTG 

251 CAGGCGGCGT GGTCGGTGCA TTATCGGTCA GCTTGGTTTC CAAAGATATT 

25 3 01 CTGCTGGCGG TCGTGCCGGT TTTGTTGATA TTTGTCGCGC TGTATTTTGT 

351 GTTTTCGCCC AAGCTCGACG GCAGTAAGGA AGGCAAAGCC AGAATGTCTT 

4 01 TTTTTCTGTT CGGTCTGACG GTTGCACCAC TTTTGGGTTT TTACGACGGT 

451 GTGTTCGGAC CGGGTGTCGG CTCGTTTTTT CTGATTGCCT TTATTGTTTT 

501 GCTCGGCTGC AAGCTGTTGA ACGCGATGTC TTACACCAAA TTGGCGAACG 

30 551 TTGCCTGCAA TCTTGGTTCG CTATCGGTAT TCCTGCTGCA CGGTTCGATT 

601 ATTTTCCCGA TTGCGGCAAC GATGGCGGTC GGTGCGTTTG TCGGTGCGAA 

651 TTTAGGTGCG AGATTTGCCG TCCGCTTCGG TTCGAAGCTG ATTAAGCCGC 

7 01 TGCTGATTGT CATCAGCATT TCGATGGCTG TGAAATTGTT GATAGACGAG 

7 51 AGAAATCCGC TGTATCAGAT GATTGTTTCG ATGTTTTAA 

35 This encodes a protein having amino acid sequence <SEQ ID 43 6>: 

1 MEDLYIILAL GLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPVSAIA TNK 

51 LQAAAAT F S A TVSFARKGLI DWKKGLPIA A AS FAGGWG A LSVSLV SKDI 

101 LLAVVPVLLI FVALYF VFSP KLDGSKEGKA R MSFFLFGLT VAPLLGFY DG 

151 VFGPG VGSFF LIAFIVLLGC KL LNAMSYTK LANVACNLGS LSVFLLHGSI 

40 2 01 I FPI AATMAV GAFVGA NLGA RFAVRFGSKL I KPLLIVISI SMAVKLLID E 

2 51 RNPLYQMIVS MF* 

ORF 109a and ORF 109-1 show 99.2% identity in 262 aa overlap: 

10 20 30 40 50 60 

orf 109a. pep ME DLY 1 1 LALGLVAMI AG FI DAI AGGGGL I T LP AL LL AG I P PV SAI ATNKLQAAAAT FS A 
45 I I I I i M I I I I I I I II I I I I M I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I 

orfl09-l ME DLY II LALGLVAM I AG FI DAI AGGGGL I T LP AL LL AG I P PV SAI ATNKLQAAAAT FS A 

10 20 30 40 50 60 

70 80 90 100 110 120 

50 orf 109a. pep TVS FARKGLIDWKKGLPIAAASFAGGWGALSVSLVSKD I LLAVVPVLLI FVALYFVFSP 

I I I I 1 I I I M I I I I I II I I I M I : I I I : I II I I I I I I I I I I I I I I I 1 I I I I I I I I II I I I 
orf 109-1 TVSFARKGLIDWKKGLPIAAAS FVGGVAGALSVSLVSKD I LLAVVPVLLI FVALYFVFSP 

70 80 90 100 110 120 

55 130 140 150 160 170 180 

orf 109a. pep KLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFFLIAFIVLLGCKLLNAMSYTK 
I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I II II II M I I I ! I I I II I I I I I I 
orf 109-1 KLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFFLIAFIVLLGCKLLNAMSYTK 
130 140 150 160 170 180 

60 

190 200 210 220 230 240 

orf 109a. pep LANVACNLGS LSVFLLHGSI I FPIAATMAVGAFVGANLGARFAVRFGSKLIKPLLI VI SI 

i M 1 I II I I 1 I I I I I I I I I I I I II I I I I II I I I I I I I I I I I I 1 I 1 I I I I I I I I I I II I | I 
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LANVACNLGSLSVFLLHGS IIFP IAATMAVGAFVGANLGARFAVRFGSKL IKPLLI VI S I 
190 200 210 220 230 240 



250 260 
orf 10 9a . pep SMAVKLLIDERNPLYQMIVSMFX 



Homology with a predicted ORF from N. gonorrhoeae 

ORF109 shows 98.3% identity over a 231aa overlap with a predicted ORF (ORF109.ng) from N. 
gonorrhoeae: 

orfl09 .pep MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 60 

I I I I I I I I I I I I I MM 

Orfl09ng MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 60 

orf 109 .pep TVSFARKGLIDWKKGLPIAAASFVGGVAGALSVSLVSKDILLAVVPVLLIFVALYFVFSP 120 
I!I!I![|[||||||IE!IIII!:|||:|!IIIIIII1I!I!I!!!I!I[II[!E!III! 

orfl09ng TVSFARKGLIDWKKGLPIAAASFAGGWGALSVSLVSKDILLAVVPVLLIFVALYFVFSP 120 

orf 109 . pep KLDGSKEGKARMSFFLFGLTVXTAFGFLRRCVRTGCRLVFSDCLYCFARLQAVERDVLHQ 180 

M M II M M M M M M M 1 M M M M M M M M M M M M M M M M M M M 
orfl09ng KLDGSKEGKARMSFFLFGLTVATAFGFLRRCVRTGCRLVFSDCLYCFARLQAVERDVLHQ 180 

orf 109 . pep IGERCLQSWFAIGIPAARFDYFPDCGNDGGRCVCRCEFRCEICRTLRFEAD 231 

M M M M M M M M M M M M M M M M M M M M M M M M M 
orfl09ng IGERCLQSWFAIGIPAARFDYFPDCGNDGGRCVCRCEFRCEICRPLRFEAD 231 

An ORF109ng nucleotide sequence <SEQ ID 437> was predicted to encode a protein having amino 
acid sequence <SEQ ID 43 8>: 

1 MEDLYIILAL GLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPVSAIA TNK 

51 LQAAAATFSA TVSFARKGLI DWKKGLP IA A AS FAGGWGA LSVSLV SKDI 

101 LLAVVPVLLI FVALYF VFSP KLDGSKEGKA RMSFFLFGLT VATAFGFL RR 

151 CVRTGCRLVF SDCLYCFARL QAVERDVLHQ IGERCLQSWF AIGIPAARFD 

2 01 YFPDCGNDGG RCVCRCEFRC EICRPLRFEA D* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 439>: 

1 ATGGAAGATT TATACATAAT ACTCGCTTTG GGTTTGGTTG CGATGATCGC 

51 CGGATTTATC GATGCGATTG CGGGCGGGGG TGGTTTGATT ACGCTGCCTG 

101 CACTCTTGTT GGCAGGTATT CCTCCCGTGT CGGCAATTGC CACCAACAAG 

151 CTGCAAGCAG CCGCTGCTAC GTTTTCGGCT ACGGTTTCTT TTGCACGCAA 

2 01 AGGTTTGATT GATTGGAAGA AAGGTCTCCC GATTGCCGCA GCATCGTTTG 
251 CAGGCGGCGT GGTCGGTGCA TTATCGGTCA GCTTGGTTTC CAAAGATATT 

3 01 TTGCTGGCGG TCGTGCCGGT TTTGTTGATA TTTGTCGCGC TGTATTTTGT 
351 GTTTTCGCCC AAGCTCGACG GCAGTAAGGA AGGCAAAGCC AGAATGTCTT 

4 01 TTTTTCTATT CGGGCTGACG GTTGCACCGC TTTTGGGTTT TTACGACGGT 
4 51 GTGTTCGGAC CGGGTGTCGG CTCGTTTTTT CTGATTGCCT TTATTGTTTT 
501 GCTCGGCTGC AAGCTGTTGA ACGCGATGTC TTACACCAAA TTGGCGAACG 
551 TTGCTTGCAA TCTTGGTTCG CTATCGGTAT TCCTGCTGCA CGGTTCGATT 
601 ATTTTCCCGA TTGTGGCAAC GATGGCGGTC GGTGCGTTTG TCGGTGCGAA 
651 TTTAGGTGCG AGATTTGCCG TCCGCTTCGG TTCGAAGCTG ATTAAGCCGC 
7 01 TGCTGATTGT CATCAGCATT TCGATGGCTG TGAAATTGTT GATAGACGAG 
7 51 AGAAATCCGC TGTATCAGAT GATTGTTTCG ATGTTTTAA 

This corresponds to the amino acid sequence <SEQ ID 440; ORF109ng-l>: 

1 MEDLYIILAL GLVAMIAGFI DAIAGGGGLI TLPALLLAGI PPVSAIA TNK 

51 LQAAAATFSA TVSFARKGLI DWKKGLP I AA AS FAGGWGA LSVSLV SKDI 

101 LLAVVPVLLI FVALYF VFSP KLDGSKEGKA RMSFFLFGLT VAPLLGFY DG 

151 VFGPG VGSFF LIAFIVLLGC KL LNAMSYTK LANVACNLGS LSVFLLHGSI 

2 01 IFPIVATMAV GAFVGA NLGA RFAVRFGSKL IK PLLIVISI SMAVKLLID E 

251 RNPLYQMIVS MF* 
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ORF109ng-l and ORF109-1 show 98.9% identity in 262 aa overlap: 

10 20 30 40 50 60 

orfl0 9nq-l pep MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 
| | | | | | | | | I I I |[ I i I I I I I I I I I i I I I I I I I M M I I I I I I I i I I I I II I I I I I I I M 
orfl0 9-l MEDLYIILALGLVAMIAGFIDAIAGGGGLITLPALLLAGIPPVSAIATNKLQAAAATFSA 

10 20 30 40 50 60 

70 80 90 100 110 120 

orfl0 9ng-l.pep TVSFARKGLIDWKKGLPIAAASFAGGVVGALSVSLVSKDILLAWPVLLIFVALYFVFSP 
I | | | | | I II I I I I I I I I i I I I I I : I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I 
orfl0 9-l TVSFARKGLIDWKKGLPIAAASFVGGVAGALSVSLVSKDILLAWPVLLIFVALYFVFSP 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 109ng-l . pep KLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFFL IAFIVLLGCKLLNAMSYTK 

I | I I I I I II I I I I I I I II I I I i I I I I I I I II I I I I I I I I I 1 I I I I I I I I I I I M I I I I I I 
orf 10 9-1 KLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFFLIAFIVLLGCKLLNAMSYTK 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 109ng-l . pep LANVACNLGSLSVFLLHGSIIFPIVATMAVGAFVGANLGARFAVRFGSKLIKPLLIVISI 

II I I I I I I M II I I I I I ! I I I I I I : I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I M 
orf 109-1 LANVACNLGSLSVFLLHGSIIFPIAATMAVGAFVGANLGARFAVRFGSKLIKPLLIVISI 

190 200 210 220 230 240 

250 260 
orf 109ng-l .pep SMAVKLLIDERNPLYQMIVSMFX 
I I I I I I I I I I I I I I I I I I I I I I I 
orfl09-l SMAVKLLIDERNPLYQMIVSMFX 

250 260 

In addition, ORF109ng-l shows homology to a hypothetical Pseudomonas protein: 

sp I P29942 | YCB9_PSEDE HYPOTHETICAL 27.4 KD PROTEIN IN COBO 3 'REGION (ORF9) 
>gi I 94984 Ipirl | 138164 hypothetical protein 9 - Pseudomonas sp >gi|551929 
(M62866) ORF9 [Pseudomonas denitrif icans] Length = 261 
Score = 175 bits (439), Expect = 3e-43 

Identities = 83/214 (38%), Positives = 131/214 (60%), Gaps = 1/214 (0%) 

Query: 41 PPVSAIATNKLQXXXXXXXXXXXXXRKGLIDWKKGLPIXXXXXXXXXXXXXXXXXXXKDI 100 

PP+ + TNKLQ R+G ++ K+ LP+ D+ 

Sbjct: 43 PPLQTLGTNKLQGLFGSGSATLSYARRGHVWLKEQLPMALMSAAGAVLGALLATIVPGDV 102 

Query: 101 LLAWPVLLIFVALYFVFSPKLDGSKEGKARMSFFLFGLTVAPLLGFYDGVFGPGVGSFF 160 

L A++P LLI +ALYF P + G + +R++ F+F LT+ PL+GFYDGVFGPG GSFF 
Sbjct: 103 LKAILPFLLIAIALYFGLKPNM-GDVDQHSRVTPFVFTLTLVPLIGFYDGVFGPGTGSFF 161 

Query: 161 LIAFIVLLGCKLLNAMSYTKLANVACNLGSLSVFLLHGSIIFPIVATMAVGAFVGANLGA 220 

++ F+ L G +L A ++TK N N+G+ VFL G++++ + M +G F+GA +G+ 
Sbjct: 162 MLGFVTLAGFGVLKATAHTKFLNFGSNVGAFGVFLFFGAVLWKVGLLMGLGQFLGAQVGS 221 

Query: 221 RFAVRFGSKLIKPLLIVISISMAVKLLIDERNPL 254 

R+A+ G+K+IKPLL+++SI++A++LL D +PL 
Sbjct: 222 RYAMAKGAKI IKPLLVI VS IALAIRLLADPTHPL 255 

Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 
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Example 52 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 44 1>: 

1 . . ctgctagggt; attgcatcgg ttatcggtac ggctgttgca gcaaaaccag 

51 CCGCAGACGG ATTATTTGGT CAAATTCGGA TCGTTTTGGG CGAG . ATTTT 

101 tggttttctg ggactgtatg ACGTCTATGC TTCGGCATGG TTTGTCGTTA 

151 TCATGATGTT TTTGGTGGTT TCTACCAGTT TGTGCCTGAT TCGCAATGTG 

201 CCGCCGTTCT GGCGCGAAAT GAAGTCTTTT CGGGAAAAGG TTAAAGAAAA 

251 ATCTCTGGCG GCGATGCGCC ATTCTTCGCT GTTGGATGTA AAAATTGCGC 

301 CCGAGGTTGC CAAACGTTAT CTGGAAGTAC AAGGTTTTCA GGGGAAAACC 

351 ATTAACCGTG AAGACGGGTC GGTTCTGATT GCCGCCAAAA AAGGCACAAT 

4 01 GAACAAATGG GGCTATATCT TTGCCCATGT TGCTTTGATT GTCATTTGCC 

451 TGGGCGGGTT GATAGACAGT AACCTGCTGT TGAAACTGGG TATGCTGACC 

501 GGTCGGATTg TTCCGGACAA TCAGGCGGTT TATGCCAAGG ATTTC.AAGC 

551 CCGAAAGTAT . TTTGGGTGC gTCCAATCTC TCATTTAGGG GCAACGTCAA 

601 TATTTCCG.A GGGGCAGAgT GCGGATGTGG TTTTCCTGA 

This corresponds to the amino acid sequence <SEQ ID 442; ORF1 10>: 

1 . . LLGIASVIGT LLQQNQPQTD YLVKFGSFWA XIFGFLGLYD VYASAWFWI 

51 MMFLWSTSL CLIRNVPPFW REMKSFREKV KEKSLAAMRH SSLLDVKIAP 

101 EVAKRYLEVQ GFQGKTINRE DGSVLIAAKK GTMNKWGYIF AHVALIVICL 

151 GGLIDSNLLL KLGMLTGRIF RTIRRFMPRI XKPESXFGCV QSLI*GQRQY 

201 FXRGRVRMWF S* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with ORF88a from N. meningitidis (strain A) 

ORF1 10 shows 91 .5% identity over a 188aa overlap with ORF88a from strain A of iV. meningitidis: 

10 20 30 40 50 60 

orf8 8a.pep MSKSRRSPPLLSRPWFAFFSSMRF AVALLSLLGIASVIGTVL QQNQPQTDYLVKFGSFWA 

I I I I I I II M = I I I I I I I I I 1 I I I I I I I I I 
orfllO LLGIASVI GTLL QQNQPQTDYLVKFGS FWA 

10 20 30 

70 80 90 100 110 120 

orf 88a . pep QIFGFLGLYDVYASAW FWIMMFLWSTSLCLI RNVPPFWREMKSFREKVKEKSLAAMRH 
I I I I II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I II I I I I I I I I I I I 
orf 1 10 XIFGFLGLYDVYASAW FVVIMMFLVVSTSLCLI RNVPPFWREMKSFREKVKEKSLAAMRH 

40 50 60 70 80 90 

130 140 150 160 170 180 

orf 8 8a . pep SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWG YIFAHVALIVICL 
I I I I I II I I I II I I I I I I I I I I I! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I 
orf 1 10 SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWG YIFAHVALIVICL 

100 110 120 130 140 150 

190 200 210 220 230 240 

orf 8 8a . pep GGLI DSNLLLKLGMLTGRIVPDNQAVYAKDFKPESILGASNLSFRGNVNISEGQSADVVF 



250 260 270 280 290 300 

orf 8 8a. pep LNADNGILVQDLPFEVKLKKFHIDFYNTGMPRDFASDIEVTDKATGEKLERTIRVNHPLT 

orfllO SX 

However, ORF88 and ORF110 do not align, because they represent two different fragments of the 



same protein. 
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Homology with a predicted ORF from N. gonorrhoeae 

ORF 1 1 0 shows 88.6% identity over a 2 1 1 aa overlap with a predicted ORF (ORF 1 1 0 .rig) from N. 
gonorrhoeae: 

orfllO pep LLGIASVIGTLLQQNQPQTDYLVKFGSFWA 3 0 

I I ! II I I I I I : I I I I I I I I I I I I I I I II: 
orfllOng MSKSRISPTLLSRPWFAFFSSMRFAVALLSLLGIASVIGTVLQQNQPQTDYLVKFGPFWT 60 

orfllO pep XIFGFLGLYDVYASAWFWIMMFLWSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRH 90 

I j i II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II I II I 1 M I I I I I 
orfllOng RIFDFLGLYDVYASAWFWIMMFLWSTSLCLIRNVPPFWREMKSFREKVKEKSLAAMRH 120 

orfllO .pep SSLLDVKIAPEVAKRYLEVQGFQGKTINREDGSVLIAAKKGTMNKWGYIFAHVALIVICL 150 

I I I I I I I I I I I I I II I I I I : I I I I I I : : I I I I II II I I I I I I I I I I I I I I I I I I I I I I I 
orfllOng SSLLDVKIAPEVAKRYLEVRGFQGKTVSREDGSVL IAAKKGTMNKWGYIXAHVALIVICL 18 0 

orfllO .pep GGLIDSNLLLKLGMLTGRIFRTIRRFMPRIXKPESXFGCVQSLIXGQRQYFXRGRVRMWF 210 

I II: 111111111:1 III: II I I I I MM : I I I I I I I I I I I I I I : I I I I I 
orfllOng GRLINXNLLLKLGMLAGSIFRNNRRVMPRISKPESIWGGVQSLIKGQRQYFQRGKVRMWF 24 0 

orfllO. pep S 211 

I 

orfllOng S 241 

The complete length ORF1 lOng nucleotide sequence <SEQ ID 443> is predicted to encode a 
protein having amino acid sequence <SEQ ED 444>: 

1 MSKSRISPTL LSRPWFAFFS SMRF AVALLS LLGIASVIGT VL QQNQPQTD 

51 YLVKFGPFWT RIFDFLGLYD VYASAW FWI MMFLWSTSL CLI RNVPPFW 

101 REMKSFREKV KEKS LAAMRH SSLLDVKIAP EVAKRYLEVR GFQGKTVSRE 

151 DGSVLIAAKK GTMNKWGYIX AHVALIVICL GRLINXN LLL KLGMLAGSIF 

2 01 RNNRRVMPRI SKPESIWGGV QSLIKGQRQY FQRGKVRMWF S* 

Based on the putative transmembrane domains in the gonococcal protein, it is predicted that the 
proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for 
vaccines or diagnostics, or for raising antibodies. 

Example 53 

The following DNA sequence was identified in N. meningitidis <SEQ ID 445>: 

1 ATGCCGTCTG AAACACGCCT GCCGAACTTT ATCCGCGTCT TGATATTTGC 

51 CCTGGGTTTC ATCTTCCTGA ACGCCTGTTC GGAACAAACC GCGCAAACCG 

101 TTACCCTGCA AGGCGAAACG ATGGGCACGA CCTATACCGT CAAATACCTT 

151 TCAAATAATC GGGACAAACT CCCCTCACCT GCCGAAATAC AAAAACGCAT 

2 01 CGATGACGCG CTTAAAGAAG TCAACCGGCA GATGTCCACC TATCAGCCCG 

251 ACTCCGAAAT CAGCCGGTTC AACCAACACA CAGCCGGCAA GCCCCTCCGC 

301 ATTTCAAGCG ACTTCGCACA CGTTACTGCC GAAGCCGTCC GCCTGAACCG 

351 CCTGACACAC GGCGCGCTGG ACGTAACCGT CGGCCCCTTG GTCAACCTTT 

4 01 GGGGATT CGG CCCCGACAAA TCCGTTACCC GTGAACCGTC GCCGGAACAA 

4 51 ATCAAACAGG CGGCATCTTA TACGGGCATA GACAAAATCA TTTTGAAACA 

501 AGGCAAAGAT TACGCTTCCT TGAGCAAAAC CCACCCCAAG GCCTATTTGG 

551 ATTTATCTTC GATTGCCAAA GGCTTCGGCG TTGATAAAGT TGCGGGCGAA 

601 CTGGAAAAAT ACGGCATTCA AAATTATCTG GTCGAAATCG GCGGCGAGTT 

651 GCACGGCAAA GGCAAAAACG CGCGCGGCGA ACCGTGGCGC ATCGGTATCG 

7 01 AGCAGCCCAA TATCGTCCAA GGCGGCAATA CGCAGATTAT CGTCCCGCTG 
751 AACAACCGTT CGCTTGCCAC TTCCGGCGAT TACCGTATTT TCCACGTCGA 

8 01 TAAAAACGGC AAACGCCTCT CCCATATCAT CAACCCGAAC AACAAACGAC 
8 51 CCATCAGCCA CAACCTCGCC TCCATCAGCG TGGTCGCAGA CAGTGCGATG 
901 ACGGCGGACG GCTTGTCCAC AGGATTATTC GTATTGGGCG AAACCGAAGC 
951 CTTAAAGCTG GCAGAGCGCG AAAAACTCGC TGTTTTCCTG ATTGTCAGGG 
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1001 ATAAAGGCGG CTACCGCACC GCCATGTCTT CCGAATTTGA AAAACTGCTC 
1051 CGCTAA 

This corresponds to the amino acid sequence <SEQ ID 446; ORFlll>: 

1 MPSETRLPNF IRVLIFALGF IFLNA CSEQT AQTVTLQGET MGTTYTVKYL 
51 SNNRDKLPSP AEIQKRIDDA LKEVNRQMST YQPDSEISRF NQHTAGKPLR 
101 ISSDFAHVTA EAVRLNRLTH GALDVTVGPL WLWGFGPDK SVTREPSPEQ 
151 IKQAASYTGI DKIILKQGKD YASLSKTHPK AYLDLSSIAK GFGVDKVAGE 
201 LEKYGIQNYL VEIGGELHGK GKNARGEPWR IGIEQPNIVQ GGNTQIIVPL 
251 NNRSLAT SGD YRIFHVDKNG KRLSHIINPN NKRPISHNLA SISWADSAM 
301 TADGLSTGLF VLGETEALKL AEREKLAVFL IVRDKGGYRT AMSSEFEKLL 
351 R* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N.meningitidis (strain A) 

ORF1 1 1 shows 96.9% identity over a 351aa overlap with an ORF (ORF1 1 la) from strain A ofN. 
meningitidis: 

10 20 30 40 50 60 

orf 111a. pep MPSETRLPNFIRTLIFALSFIFLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDXLPSP 
I I I I [ I i I I I I I : I I I I I : I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 111 MPSETRLPNFIRVLIFALGFIFLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDKLPSP 



70 80 90 100 110 120 

orf 111a. pep AEIQXRIDDALKEVNRQMSTYQPDSEISRFNQHTAGKPLRISSDFAHVTAEAVHLNRLTH 
I I I I I I I I I I I i I I I I I I I I I 11 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I 
orf 111 AE IQKRI DDALKEVNRQMSTYQPD SEI SRFNQHTAGKPLRI S S DFAHVTAEAVRLNRLTH 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 111a. pep GALDVTVGPLVNLWGFGPDKSVTREPSPEQIKQAASYTGIDKIILKQGKDYASLSKTHPK 
I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I 
orf 111 GALDVTVGPLVNLWGFGPDKSVTREPSPEQIKQAASYTG I DKIILKQGKD YASLSKTHPK 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 111a. pep AYLDLSSIAKGFGVDXVAGELEKYGIQNYLVEIGGELHGKXKNARGEPWRIGIEQPNIVQ 
I I I I I I I I ! I! I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I ! I M I I i I I I I I I I 
orf 111 AYLDLSSIAKGFGVDKVAGELEKYGIQNYLVEIGGELHGKGKNARGEPWRIGIEQPNIVQ 

190 200 210 220 230 240 



250 260 270 280 290 300 

orf 111a. pep GGNTQIIVPLNNRSXATSGDYRIFHVDKSGKRLSHIINPNNKRPISHNLASISVXAD SAM 
I I I I I I I I I I I I I I I I I I I I! I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 111 GGNTQIIVPLNNRSLATSGDYRIFHVDKNGKRLSHIINPNNKRPISHNLASISWADSAM 

250 260 270 280 290 300 



310 320 330 340 350 

TADGXSTGLFVLGETEALKLAEREKLAVFLIVRDKGGYRTAMSSEFEKLLRX 
I I I I I I I I I I I I I I I I I I I I I 11 I I I I I I I I I I I I I I I I I I I I I I I M M I 
TADGLSTGLFVLGETEALKLAEREKLAVFLIVRDKGGYRTAMSSEFEKLLRX 

310 320 330 340 350 



The complete length ORF1 11a nucleotide sequence <SEQ ID 447> is: 



1 ATGCCGTCTG AAACACGCCT GCCGAACTTT ATCCGCACCT TGATATTTGC 

51 CCTGAGTTTT ATCTTCCTGA ACGCCTGTTC GGAAC AAAC C GCGCAAACCG 

101 TTACCCTGCA AGGT GAAACG ATGGGCACGA CCTATACCGT CAAATACCTT 

151 TCAAATAATC GGGACNAACT CCCNTCACCT GCCGAAATAC AAAANCGCAT 

2 01 CGATGACGCG CTTAAAGAAG TCAACCGGCA GAT GT C C AC C TATCAGCCCG 
251 ACTCCGAAAT CAGCCGGTTC AACCAACACA CAGCCGGCAA GCCCCTCCGC 

3 01 ATTTCAAGCG ACTTCGCACA CGTTACTGCC GAAGCCGTCC ACCTGAACCG 
351 CCTGACACAC GGCGCGCTGG ACGTAACCGT CGGCCCCTTG GTCAACCTTT 

4 01 GGGGATTCGG CCCCGACAAA TCCGTTACCC GTGAACCGTC GCCGGAACAA 
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451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 



ATCAAACAAG 
AGGCAAAGAT 
ATTTATCTTC 
CTGGAAAAAT 
GCACGGCAAA 
AACAGCCCAA 
AACAACCGTT 
TAAAAGCGGC 
CCATCAGCCA 
ACGGCGGACG 
CTTAAAGCTG 
ATAAAGGCGG 
CGCTAA 



CAGCATCTTA 
TACGCTTCCT 
GATTGCCAAA 
ACGGCATTCA 
GNCAAAAACG 
CATCGTCCAA 
CGNTTGCCAC 
AAACGCCTCT 
CAACCTCGCC 
GCTTNTCCAC 
GCAGAGCGCG 
CTACCGCACC 



TACGGGCATA 
TGAGCAAAAC 
GGCTTCGGCG 
AAATTATCTG 
CGCGCGGCGA 
GGCGGCAATA 
TTCCGGCGAT 
CCCATATCAT 
TCCATCAGCG 
AGGATTATTC 
AAAAACTCGC 
GCCATGTCTT 



GACAAAATCA 
CCACCCCAAG 
TTGATNANGT 
GTCGAAATCG 
ACCTTGGCGC 
CGCAGATTAT 
TACCGTATTT 
TAATCCGAAC 
TGNTCGCAGA 
GTATTGGGCG 
TGTTTTCCTG 
CCGAATTTGA 



TTTTGAAACA 
GCCTATTTGG 
TGCGGGCGAA 
GCGGNGAGTT 
ATCGGCATCG 
CGTCCCGCTG 
TCCACGTCGA 
AACAAAC G AC 
CAGTGCGATG 
AAACCGAAGC 
ATTGTCAGGG 
AAAACTGCTC 



This encodes a protein having amino acid sequence <SEQ ID 448>: 



101 
151 
201 
251 
301 
351 



1 MPSETRLPNF IRTLIFALSF IFLNA CSEQT AQTVTLQGET MGTTYTVKYL 
1 SNNRDXLPSP AEIQXRIDDA LKEVNRQMST YQPDSEISRF NQHTAGKPLR 
ISSDFAHVTA EAVHLNRLTH GALDVTVGPL VNLWGFGPDK SVTREPSPEQ 
IKQAASYTGI DKIILKQGKD YASLSKTHPK AYLDLSSIAK GFGVDXVAGE 
LEKYGIQNYL VEIGGELHGK XKNARGEPWR IGIEQPNIVQ GGNTQIIVPL 
NNRSXATSGD YRIFHVDKSG KRLSHIINPN NKRPISHNLA SISVXADSAM 
TADGXSTGLF VLGETEALKL AEREKLAVFL IVRDKGGYRT AMSSEFEKLL 



Homology with a predicted ORF from N. gonorrhoeae 

ORF1 1 1 shows 96.6% identity over a 351 aa overlap with a predicted ORF (ORF1 1 1 .ng) from N. 
gonorrhoeae: 



40 



orflllng 
orflll 



rflll 
rflll 



orflllng 
orflll 



orflllng 
orflll 



MPSETRLPNLIRALIFALGFIFLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDKLPSP 
I I I I I I I I I : i I : I I I I I I I I I I I I I I I I [ I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 
MPSETRLPNFIRVLIFALGFIFLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDKLPSP 



10 



20 



30 



40 



50 



60 



70 80 90 100 110 120 

AKIQKRI DDALKEWRQMST YQTD SE I SRFNQHTAGKPLRI S S DFAHVTAEAVRLNRLTH 
I : I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I ! I I I I I 
AEIQKRI DDALKEVNRQMST YQPD SE I SRFNQHTAGKPLRI S S DFAHVTAEAVRLNRLTH 

70 80 90 100 110 120 

130 140 150 160 170 180 

GALDVTVGPLVNLWGFGPDKSVTREPSPEQIKQAASYTGIDKIILQQGKDYASLSKTHPK 
I II II I I I I I I I I I M I I I I II I I I M I I I I I I I I I I II I I I I I I : I I I M M I 1 I I I I i 
GALDVTVGPLVNLWGFGPDKSVTREPSPEQIKQAASYTGIDKIILKQGKDYASLSKTHPK 

130 140 150 160 170 180 

190 200 210 220 230 240 

AYLDLSSIAKGFGVDKVAGELEKYGIQNYLVEIGGELHGKGKNAHGEPWRIGIEQPNIIQ 



60 



250 260 270 280 290 300 

orflllng GGNTQIIVPLNNRSLATSGDYRIFHVDKNGKRLSHIINPNNKRPISHNLASISWSDSAM 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I : I I I I 
orflll GGNTQI IVPLNNRSLAT S GDYRI FHVDKNGKRLSHI INPNNKRP I SHNLAS I S WADSAM 

250 260 270 280 290 300 

310 320 330 340 350 

orflllng TADGLSTGLFVLGETEALRLAEQEKLAVFLIVRDKDGYRTAMSSEFAKLLRX 
I I I I I I I I I II I I I I I II : I I I : I I 1 II I I I I I I I I I I I I I I I I I I I I I I 
orflll TADGLSTGLFVLGETEALKLAEREKLAVFLIVRDKGGYRTAMSSEFEKLLRX 

310 320 330 340 350 

The complete length ORF1 1 lng nucleotide sequence <SEQ ID 449> is: 
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1 ATGCCGTCTG AAACACGCCT GCCGAACCTT ATCCGCGCCT TGATATTTGC 

51 CCTGGGTTTC ATCTTCCTGA ACGCCTGTTC GGaacaaacC GCGCAaaccg 

101 TTACCCTGCA AGGCGAAAcg aTGGGTACGA CCTATACCGT CAAATACCTT 

151 TCAAATAATC GGGACAAACT CCCCTCCCCT GCCAAAATAC AAAAGCGCAT 

5 201 TGATGATGCG CTTAAAGAAG TCAACCGGCA GATGTCCACC TACCAGACCG 

251 ATTCCGAAAT CAGCCGGTTC AACCAACACA CAGCCGGCAA GCCCCTCCGC 

301 ATTTCAAGCG ATTTCGCACA CGTTACCGCC GAAGCCGTCC GCCTGAACCG 

351 CCTGACTCAC GGCGCACTGG ACGTAACCGT CGGCCCTTTG GTCAACCTTT 

4 01 GGGGGTTCGG CCCCGACAAA TCCGTTACCC GTGAACCGTC GCCGGAACAA 

10 451 ATCAAACAGG CGGCATCTTA TACGGGCATA GACAAAATCA TTTTGCAACA 

501 • AGGCAAAGAT TACGCTTCCT TGAGCAAAAC CCACCCCAAA GCCTATTTGG 

551 ATTTATCTTC GATTGCCAAA GGCTTCGGCG TTGATAAAGT TGCGGGCGAA 

601 CTGGAAAAAT ACGGCATTCA AAATTATCTG GTCGAAAtcg gcggcGAGTT 

651 GCACGGCAAA GGCAAAAATG CGCACGGCGA ACCGTGGCGC ATCGGTATAG 

15 701 AGCAACCCAA TATCATCCAA GgcgGCAata CGCAGATTAt cgtcccgctg 

751 aaCaaccgtt cgctTGCCAC TTCCGGCGAT TAccgtaTTT tccacgtcgA 

801 TAAAAAcggc aaacgccttt cccacaTCAT CAATCCCaAC aacAAACgac 

851 ccATCAGcca caacctcgcc tccatcagcg tggtctcAGA CAGTGCAATG 

901 ACGGCGGACG GTTtatCCAC AGGATTATTT GTTTTAGGCG AAACCGAAGC 

20 951 CTTAAGGCTG GCAGAACAAG AAAAACT CGC TGTTTTCCTA ATTGTCCGGG 

1001 ATAAGGACGG CTACCGCACC GCCATGTCTT CCGAATTTGC CAAGCTGCTC 

1051 CGCTAA 

This encodes a protein having amino acid sequence <SEQ ID 45 0>: 

1 MPSETRLPNL IRALIFALGF IFLNA CSEQT AQTVTLQGET MGTTYTVKYL 

25 51 SNNRDKLPSP AKIQKRIDDA LKEVNRQMST YQTDSEISRF NQHTAGKPLR 

101 ISSDFAHVTA EAVRLNRLTH GAL DVTVGPL VNLWGFGPDK SVTREPSPEQ 

151 IKQAASYTGI DKIILQQGKD YASLSKTHPK AYLDLSSIAK GFGVDKVAGE 

201 LEKYGIQNYL VEIGGELHGK GKNAHGEPWR IGIEQPNIIQ GGNTQIIVPL 

251 NNRSLATSGD YRIFHVDKNG KRLSHIINPN NKRPISHNLA SISWSDSAM 

30 301 TADGLSTGLF VLGETEALRL AEQEKLAVFL IVRDKDGYRT AMSSEFAKLL 

351 R* 

This protein shosw homology with a hypothetical lipoprotein precursor from H. influenzae: 

sp I P44550 I YOJL_HAEIN HYPOTHETICAL LIPOPROTEIN HI0172 PRECORSOR >gi | 1074292 | pir 
hypothetical protein HI0172 - Haemophilus influenzae (strain Rd KW20) 
35 >gi | 1573128 (U32702) hypothetical [Haemophilus influenzae] Length = 346 

Score = 353 bits (896), Expect = 9e-97 

Identities = 181/344 (52%), Positives = 247/344 (71%), Gaps = 4/344 (1%) 

Query: 7 LPNLIRALIFALGFIFLNACSEQTAQTVTLQGETMGTTYTVKYLSNNRDKLPSPAKIQKR 66 
40 + LI +1 + L AC ++T + ++L G+TMGTTY VKYL + S K + 

Sbjct: 1 MKKLISGIIAVAMALSLAACQKET-KVISLSGKTMGTTYHVKYLDDGSITATSE-KTHEE 58 

Query: 67 IDDALKEVNRQMSTYQTDSEISRFNQHT-AGKPLRISSDFAHVTAEAVRLNRLTHGALDV 125 
1+ LK+VN +MSTY+ DSE+SRFNQ+T P+ IS+DFA V AEA4-RLN++T GALDV 

45 Sbjct: 5 9 IEAILKDVNAKMSTYKKDSELSRFNQNTQVNTPIEISADFAKVLAEAIRLNKVTEGALDV 118 

Query: 126 TVGPLVNLWGFGPDKSVTREPSPEQIKQAASYTGIDKIILQQGKDYASLSKTHPKAYLDL 185 

TVGP+VNLWGFGP+K ++P+PEQ+ + ++ GIDKI L K+ A+LSK P+ Y+DL 
Sbjct: 119 TVGPWNLWGFGPEKRPEKQPTPEQLAERQAWVGIDKITLDTNKEKATLSKALPQVYVDL 178 



Sbjct: 179 SSIAKGFGVDQVAEKLEQLNAQNYMVEIGGEIRAKGKNIEGKPWQIAIEKPTTTGERAVE 238 

Query: 246 IIVPLNNRSLATSGDYRIFHVDKNGKRLSHIINPNNKRPISHNLASISWSDSAMTADGL 305 

++ LNN +A+SGDYRI+ ++NGKR +H I+P PI H+LASI+V++ ++MTADGL 

Sbjct: 239 AVIGLNNMGMASSGDYRIY-FEENGKRFAHEIDPKTGYPIQHHLASITVLAPTSMTADGL 297 

Query: 306 STGLFVLGETEALRLAEQEKLAVFLIVRDKDGYRTAMSSEFAKL 34 9 

STGLFVLGE +AL +AE+ LAV+LI+R +G+ T SS F KL 
Sbjct: 298 STGLFVLGE DKALEVAEKNNLAVYLIIRTDNGFVTKSSSAFKKL 341 

Based on this analysis, it is predicted that the proteins from N. meningitidis and 7Y. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 
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Example 54 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 45 1>: 





l 


. .CCGTGCCGCC 


GACAGGGCGA 


CGACGTGTAT 


GCGGCGCACG 


CGTCCCGTCA 




51 


AAAATTGTGG 


CTGCGCTTCA 


TCGGCGGCCG 


GTCGCATCAA 


AATATACGGG 


5 


101 


GCGGCGCGGC 


TGCGGACGGG 


TGGCGCAAAG 


GCGTGCAAAT 


CGGCGGCGAG 




151 


GTGTTTGTAC 


GGCAAAATGA 


AGGCAGCCkA 


yTGGCAATCG 


GCGTGATGGG 




201 


CGGCAGGGCC 


GGCCAGCACG 


CwTCAGTCAA 


CGGCAAAGGC 


GGTGCGGCAG 




251 


gCAGTGATTT 


GTATGGTTAT 


GgCGGGGgTG 


TTTATGCTgC 


GTGGCATCAG 




301 


TTGCGCGATA 


AACAAACGGG 


TgCGTATTTG 


GACGGCTGGT 


TGCAATACCA 


10 


351 


ACGTTTCAAA 


CACCGCATCA 


ATGATGAAAA 


CCGTGCGGAA 


CgCTACAAAA 


401 


CCAAAGGTTG 


GACGGCTTCT 


GTCGAAGGCG 


GCTACAACGC 


GCTTGTGGCG 




451 


GAAGGCATTG 


TCGGAAAAGG 


CAATAATGTG 


CGGTTTTACC 


TACAACCGCA 




501 


GgCGCAGTTT 


ACCTACTTGG 


GCGTAAACGG 


CGGCTTTACC 


GACAGCGAGG 




551 


GGACGGCGGT 


CGGACTGCTC 


GGCAGCGGTC 


AGTGGCAAAG 


CCGCGCCGGC 


15 


601 


AtTCGGGCAA 


AAACCCGTTT 


TGCTTTGCGT 


AACGGTGTCA 


ATCTTCAGCC 




651 


TTTTGCCGCT 


TTTAATGTtt 


TGCACAGGTC 


AAAATCTTTC 


GGCGTGGAAA 




701 


TGGACGGCGA 


AAAACAGACG 


CTGGCAGGCA 


GGACGGCACT 


CGAAGGGCGG 




751 


TTCGGTATTG 


AAGCCGGTTG 


GAAAGGCCAT 


ATGTCCGCA. . 






This corresponds to the amino acid sequence <SEQ ID 452; ORF35>: 


20 


i 


. . PCRRQGDDVY 


AAHASRQKLW 


LRFIGGRSHQ 


NIRGGAAADG 


WRKGVQIGGE 




51 


VFVRQNEGSX 


LAIGVMGGRA 


GQHASVNGKG 


GAAGSDLYGY 


GGGVYAAWHQ 




101 


LRDKQTGAYL 


DGWLQYQRFK 


HRINDENRAE 


RYKTKGWTAS 


VEGGYNALVA 




151 


EGIVGKGNNV 


RFYLQPQAQF 


TYLGVNGGFT 


DSEGTAVGLL 


GSGQWQSRAG 




201 


IRAKTRFALR 


NGVNLQPFAA 


FNVLHRSKSF 


GVEMDGEKQT 


LAGRTALEGR 


25 


251 


FGIEAGWKGH 


MSA. . 









Computer analysis of this amino acid sequence gave the following results: 



45 



Homology with putative secreted VirG-homolgue of N. meninzitidis (accession number 
A32247) 

ORF and virg-h protein show 51% aa identity in 261aa overlap: 

Orf35 5 QGDDVYAAHASRQKLWLRFIGGRSHQNIRGGAA-ADGWRKGVQIGGEVFVRQNEGSXLAI 63 

+ D++ R+ LWLR I G S+Q ++G A +G+RKGVQ+GGEVF QNE + L+I 

virg-h 396 KNSDIFDRTLPRKGLWLRVIDGHSNQWVQGKTAPVEGYRKGVQLGGEVFTWQNESNQLSI 455 

Orf35 64 GVMGGRAGQHAS VNGKG — GAAGS DLYGYGGGVYAAWHQLRDKQT GAYL DGWLQYQR FKH 121 

G+MGG+A Q ++ + ++ G+G GVYA WHQL+DKQTGAY D W+QYQRF+H 

virg-h 456 GLMGGQAEQRSTFHNPDTDNLTTGNVKGFGAGVYATWHQLQDKQTGAYADSWMQYQRFRH 515 

Orf35 122 RINDENRAERYKTKGWTASVEGGYNALVAEGIVGKGNNVRFYLQPQAQFTYLGVNGGFTD 181 

RIN E+ ER+ +KG TAS+E GYNAL+AE KGN++R YLQPQAQ TYLGVNG F+D 
virg-h 516 RINTEDGTERFTSKGITASIEAGYNALLAEHFTKKGNSLRVYLQPQAQLTYLGVNGKFSD 575 

Orf35 182 SEGTAVGLLGSGQWQSRAGIRAKTRFALRNGVNLQPFAAFNVLHRSKSFGVEMDGEKQTL 241 

SE V LLGS Q Q+R G++AK +F+L + ++PFAA N L+ +K FGVEMDGE++ + 
virg-h 57 6 SENAHVNLLGSRQLQTRVGVQAKAQFSLYKNIAIEPFAAVNALYHNKPFGVEMDGERRVI 635 



virg-h 636 NNKTAIESQLGVAVKIKSHLT 656 

50 Homology with a predicted ORF from N. meningitidis (strain A) 

ORF35 shows 96.9% identity over a 259aa overlap with an ORF (ORF35a) from strain A of N. 
meningitidis: 

10 20 30 

orf 35 . pep PCRRQGDDVYAAHASRQKLWLRFIGGRSHQNIRG 
55 : | | | | | | | | | | | | | | | | | M I | I | I | | | 
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QRLAIPEAEAVLYAQQAYAANTLFGLRAADRGDDVYAADPSRQKLWLRFIGGRSHQNIRG 
310 320 330 340 350 360 

40 50 60 70 80 90 

GAAADGWRKGVQIGGEVFVRQNEGSXLAIGVMGGRAGQHASVNGKGGAAGSDLYGYGGGV 



370 380 390 400 410 420 

100 110 120 130 140 150 

YAAWHQLRDKQTGAYLDGWLQYQRFKHRINDENRAERYKTKGWTASVEGGYNALVAEGIV 

1 I I I I 1 I I I I I I I I I I I I I I I i I I I I I I : I 

YAAWHQLRDKQTGAYLDGWLQYQRFKHRINDENRAERYKTKGWTASVEGGYNALVAEGW 
430 440 450 460 470 480 

160 170 180 190 200 210 

GKGNNVRFYLQPQAQFTYLGVNGGFTDSEGTAVGLLGSGQWQSRAGIRAKTRFALRNGVN 
I If I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I M i I I I I I I I M 

GKGNNVRFYLQPQAQFTYLGVNGGFTDSEGTAVGLLGSGQWQSRAGIRAKTRFALRNGVN 
490 500 510 520 530 540 

220 230 240 250 260 

LQP FAAFN VLHR SK S FGVEMDGEKQT LAGRT ALEGRFG IEAGWKGHMS A 
I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
LQPFAAFNVLHRSKSFGVEMDGEKQTLAGRTALEGRFGIEAGWKGHMSARIGYGKRTDGD 
550 560 570 580 590 600 



KEAALSLKWLFX 
610 620 

The complete length ORF35a nucleotide sequence <SEQ ID 453> is: 

1 ATGTTCAGAG CTCAGCTTGG TTCAAATACT CGTTCTACCA AAATCGGCGA 

51 CGATGCCGAT TTTTCATTTT CAGACAAGCC GAAACCCGGC ACTTCCCATT 

101 ATTTTTCCAG CGGTAAAACC GATCAAAATT CATCCGAATA TGGGTATGAC 

151 GAAATCAATA TCCAAGGTAA AAACTACAAT AGCGGCATAC TCGCCGTCGA 

2 01 TAATATGCCC GTTGTTAAGA AAT AT AT T AC AGAT ACT T AC GGGGATAATT 
251 TAAAGGATGC GGTTAAGAAG CAATTACAGG ATTTATACAA AACAAGACCC 

3 01 GAAGCTTGGG AAGAAAATAA AAAACGGACT GAGGAGGCGT ATATAGAACA 
351 GCTTGGACCA AAATTTAGTA TACT CAAACA GAAAAACCCC GATTTAATTA 

4 01 ATAAATTGGT AGAAGATTCC GTACTCACTC CTCATAGTAA TACATCACAG 
451 ACTAGTCTCA ACAACATCTT CAATAAAAAA TTACACGTCA AAATCGAAAA 
501 CAAATCCCAC GTCGCCGGAC AGGTGTTGGA ACTGACCAAG ATGACGCTGA 

5 51 AAGATTCCCT TTGGGAACCG CGCCGCCATT CCGACATCCA TATGCTGGAA 
601 ACTTCCGATA ATGCCCGCAT CCGCCTGAAC ACGAAAGATG AAAAACTGAC 
651 CGTCCATAAA GCGTATCAGG GCGGTGCGGA TTTCCTGTTC GGCTACGACG 

7 01 TGCGGGAGTC GGACAAACCC GCCCTGACCT TTGAAGAAAA AGTCAGCGGA 
751 CAATCCGGCG TGGTTTTGGA ACGCCGGCCG GAAAATCTGA AAACGCTCGA 

8 01 CGGGCGCAAA CTGATTGCGG CGGAAAAGGC AGACT CTAAT TCGTTTGCGT 
851 TTAAACAAAA TTACCGGCAG GGACTGTACG AATTATTGCT CAAGCAATGC 
901 GAAGGCGGAT TTTGCTTGGG CGTGCAGCGT TTGGCTATCC CCGAGGCGGA 
951 AGCGGTTTTA TATGCCCAAC AGGCTTATGC GGCAAATACT TTGTTCGGGC 

10 01 TGCGTGCCGC CGACAGGGGC GACGACGTGT ATGCCGCCGA TCCGTCCCGT 

1051 CAAAAATTGT GGCTGCGCTT CATCGGCGGC CGGTCGCATC AAAATATACG 

1101 GGGCGGCGCG GCTGCGGACG GGCGGCGCAA AGGCGTGCAA ATCGGCGGCG 

1151 AGGTGTTTGT ACGGCAAAAT GAAGGCAGCC GGCTGGCAAT CGGCGTGATG 

12 01 GGCGGCAGGG CTGGCCAGCA CGCATCAGTC AACGGCAAAG GCGGTGCGGC 
1251 AGGCAGTTAT TTGCATGGTT ATGGCGGGGG TGTTTATGCT GCGTGGCATC 

13 01 AGTTGCGCGA TAAACAAACG GGTGCGTATT TGGACGGCTG GTTGCAATAC 
1351 CAACGTTTCA AACACCGCAT CAATGATGAA AACCGTGCGG AACGCTACAA 
1401 AACCAAAGGT TGGACGGCTT CTGTCGAAGG CGGCTACAAC GCGCTTGTGG 

14 51 CGGAAGGCGT TGTCGGAAAA GGCAATAATG TGCGGTTTTA CCTGCAACCG 
1501 CAGGCGCAGT TTACCTACTT GGGCGTAAAC GGCGGCTTTA CCGACAGCGA 
1551 GGGGACGGCG GTCGGACTGC TCGGCAGCGG TCAGTGGCAA AGCCGCGCCG 
1601 GCATTCGGGC AAAAACCCGT TTTGCTTTGC GTAACGGTGT CAATCTTCAG 
1651 CCTTTTGCCG CTTTTAATGT TTTGCACAGG TCAAAATCTT TCGGCGTGGA 
1701 AATGGACGGC GAAAAACAGA CGCTGGCAGG CAGGACGGCG CTCGAAGGGC 
1751 GGTTCGGCAT TGAAGCCGGT TGGAAAGGCC ATATGTCCGC ACGCATCGGA 
1801 TACGGCAAAA GGACGGACGG CGACAAAGAA GCCGCATTGT CGCTCAAATG 
1851 GCTGTTTTGA 



orf35a 

orf 35 .pep 
orf35a 

orf 35 .pep 

orf35a 

orf 35 .pep 

orf35a 

orf 35. pep 

orf35a 

orf35a 
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This encodes a protein having amino acid sequence <SEQ ID 454>: 

1 MFRAQLGSNT RSTKIGDDAD FSFSDKPKPG TSHYFSSGKT DQNSSEYGYD 

51 EINIQGKNYN SGILAVDNMP WKKYITDTY GDNLKDAVKK QLQDLYKTRP 

101 EAWEENKKRT EEAYIEQLGP KFSILKQKNP DLINKLVEDS VLTPHSNTSQ 

151 TSLNNIFNKK LHVKIENKSH VAGQVLELTK MTLKDSLWEP RRHSDIHMLE 

201 TSDNARIRLN TKDEKLTVHK AYQGGADFLF GYDVRESDKP ALTFEEKVSG 

251 QSGWLERRP ENLKTLDGRK LIAAEKADSN SFAFKQNYRQ GLYELLLKQC 

301 EGGFCLGVQR LAIPEAEAVL YAQQAYAANT LFGLRAADRG DDVYAADPSR 

351 QKLWLRFIGG RSHQNIRGGA AADGRRKGVQ IGGEVFVRQN EGSRLAIGVM 

4 01 GGRAGQHASV NGKGGAAGSY LHGYGGGVYA AWHQLRDKQT GAYLDGWLQY 
451 QRFKHRINDE NRAERYKTKG WTASVEGGYN ALVAEGWGK GNNVRFYLQP 

5 01 QAQFTYLGVN GGFTDSEGTA VGLLGSGQWQ SRAGIRAKTR FALRNGVNLQ 
551 PFAAFNVLHR SKSFGVEMDG EKQTLAGRTA LEGRFGIEAG WKGHMSARIG 
601 YGKRT DGDKE AALSLKWLF* 

Homology with a predicted ORF from N.gonorrhoeae 

ORF35 shows 51.7% identity over a 261aa overlap with a predicted ORF (ORF35ngh) from N. 
gonorrhoeae: 

or f 3 5. pep PCRRQGDDVYAAHASRQKLWLRF IGGRSHQN I RG 3 4 

orf 35ngh FTKVQERDDIAIYAQQAQAANTLFALRLNDKNSDIFDRTLPRKGLWLRVIDGHSNQWVQG 37 0 

orf 35 . pep GAA- ADGWRKGVQ I GGE VFVRQNEGSXLAI GVMGGRAGQHAS VNGKG — GAAGS DLYGYG 91 

orf35ngh KTAPVEGYRKGVQLGGEVFTWQNESNQLSIGLMGGQAEQRSTFRNPDTDNLTTGNVKGFG 430 

orf 35 . pep GGVYAAWHQLRDKQTGAYLDGWLQYQRFKHRINDENRAERYKTKGWTASVEGGYNALVAE 151 

: I I I I : I I M : I I I I I I! : I : I : I I I I I : I I I I I : I I : : I I ! I! : \ : I I I I I : I I 

orf 35ngh AGVYATWHQLQDKQTGAYVDSWMQYQRFRHRINTEYATERFTSKGITASIEAGYNALLAE 4 90 

orf 35 . pep GIVGKGNNVRFYLQPQAQFTYLGVNGGFTDSEGTAVGLLGSGQWQSRAGIRAKTRFALRN 211 

: : I I I : : I I I i I I I I : I I I I I I I I : I I I : : I : I I I I I I I I : I : : 1 I : : I I : I 
orf35ngh HFTKKGNSLRVYLQPQAQLTYLGVNGKFSDSENAQVNLLGSRQLQSRVGVQAKAQFAFTN 550 

orf 35 .pep GVNLQP FAAFNVLHRS KS FGVEMDGEKQTLAGRT ALE GR FG IE AGWKGHMS A 2 63 

I I : : I I I : I I : : : : I I 1 1 I : I I : : : : : : : I : : I : : I : I I : I : : 

orf35ngh GVTFQPFVAVNSIYQQKPFGVEIDGDRRVINNKTVIETQLGVAAKIKSHLTLQASFNRQT 610 

A partial ORF35ngh nucleotide sequence <SEQ ID 455> is predicted to encode a protein having 
partial amino acid sequence <SEQ ID 45 6>: 

1 ..KKLRDRNSEY WKEETYHIKS NGRTYPNIPA LFPKHPFDPF ENINNSKKIS 

51 FYDKEYTEDY LVGFARGFGV EKRNGEEEKP LRQYFKDCVN TENSNNDNCK 

101 ISSFGNYGPI LIKSDIFALA SQIKNSHINS EILSVGNYIE WLRPTLNKLT 

151 GWQEHLYAGL DPFHYIEVTD NSHVIGQTID LGALELTNSL WKPRWNSNID 

201 YLITKNAEIR FNTKNESLLV KEDYAGGARF RFAYDLKDKV PEIPVLTFEK 

251 NITGTSDIIF EGKALDNLKH LDGHQIVKVN DTADKDAFRL SSKYRKGIYT 

301 LSLQQRPEGF FTKVQERDDI AIYAQQAQAA NTLFALRLND KNSDIFDRTL 

351 PRKGLWLRVI DGHSNQWVQG KTAPVEGYRK GVQLGGEVFT WQNESNQLSI 

4 01 GLMGGQAEQR STFRNPDTDN LTTGNVKGFG AGVYATWHQL QDKQTGAYVD 

451 SWMQYQRFRH RINTEYATER FTSKGITASI EAGYNALLAE HFTKKGNSLR 

501 VYLQPQAQLT YLGVNGKFSD SENAQVNLLG SRQLQSRVGV QAKAQFAFTN 

551 GVTFQPFVAV NSIYQQKPFG VEIDGDRRVI NNKTVIETQL GVAAKIKSHL 

601 TLQASFNRQT SKHHHAKQGA LNLQWTF* 

Based on this prediction, these proteins from N meningitidis and N.gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 55 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 457>: 
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1 . . GCGGAATATG TTCAGTTCTC TATAGATTTG TTCAGTGTGG GTAAATCGGG 

51 GGGCGGTATA CCTAAGGCTA AGCCTGTGTT TGATGCGAAA CCGAGATGGG 

101 AGGTTGATAG GAAGCTTAAT AAATTGACAA CTCGTGAGCA GGTGGAGAAA 

151 AATGTTCAGG AAACGAGAAG AAGGAGTCAG AGTAGTCAGT TTAAAGCCCA 

201 TGCGCAACGA GAATGGGAAA ATAAAACAGG GTTAGATTTT AATCATTTTA 

251 TAGGTGGTGA TAT C AAT AAA AAAGGCACAG TAACAGGAGG GCATAGTCTA 

301 ACCCGTGGTG ATGTACGGGT GATACAACAA ACCTCGGCAC CTGATAAACA 

351 TGGGGT.TTA TCAAGCGACA GTGGAAATTN A 

This corresponds to the amino acid sequence <SEQ ID 458; ORF46>: 

1 . . AEYVQFSIDL FSVGKSGGGI PKAKPVFDAK PRWEVDRKLN KLTTREQVEK 
51 NVQETRRRSQ SSQFKAHAQR EWENKTGLDF NHFIGGDINK KGTVTGGHSL 
101 TRGDVRVIQQ TSAPDKHGXL SSDSGNX 

Further work revealed further partial nucleotide sequence <SEQ ID 459>: 

1 . . GCAGTGTGCC TnCCGATGCA TGCACACGCC TCAnATTTGG CAAACGATTC 

51 TTTTATCCGG CAGGTTCTCG ACCGTCAGCA TTTCGAACCC GACGGGAAAT 

101 ACCACCTATT CGGCAGCAGG GGGGAACTTG CCGAGCGCCA GTCTCATATC 

151 GGATTGGGAA AAATACAAAG CCATCAGTTG GGCAACCTGA TGATTCAACA 

2 01 GGCGGCCATT AAAG GAAAT A TCGGCTACAT TGTCCGCTTT TCCGATCACG 

2 51 GGCACGAAGT CCATTCCCCs TTCGACAACC ATGCCTCACA TTCCGATTCT 

301 GATGAAGCCG GTAGTCCCGT TGACGGATTT AGCCTTTACC GCATCCATTG 

351 GGACGGATAC G AAC AC CAT C CCGCCGACGG CTATGACGGG CCACAGGGCG 

4 01 GCGGCTATCC CGCTCCCAAA GGCGCGAGGG ATATATACAG TTACGACATA 

4 51 AAAGGCGTTG CCCAAAATAT CCGCCTCAAC CTGACCGACA ACCGCAGCAC 

501 CGGACAACGG CTTGCCGACC GTTTCCACAA TGCCGGTAGT ATGCTGACGC 

551 AAGGAGTAGG CGACGGATTC AAACGCGCCA CCCGATACAG CCCCGAGCTG 

601 GACAGATCGG GCAATGCCGC CGAAGCCTTC AACGGCACTG CAGAT AT CGT 

651 TAAAAACATC ATCGGCGCTG CAGGAGAAAT TGT 

This corresponds to the amino acid sequence <SEQ ID 460; ORF46-l>: 

1 . . AVCL PMHAHA SXLANDSFIR QVLDRQHFEP DGKYHLFGSR GELAERQSHI 

51 GLGKIQSHQL GNLMIQQAAI KGNIGYIVRF SDHGHEVHSP FDNHASHSDS 

101 DEAGSPVDGF SLYRIHWDGY EHH PADGYDG PQGGGYPAPK GARDIYSYDI 

151 KGVAQNIRLN LTDNRSTGQR LADRFHNAGS MLTQGVGDGF KRATRYSPEL 

2 01 DRSGNAAEAF NGTADIVKNI IGAAGEI 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N .gonorrhoeae 

ORF46 shows 98.2% identity over a lllaa overlap with a predicted ORF (ORF46ng) from N. 



gonorrhoeae: 

orf46.pep AEYVQFSIDLFSVGKSGGGIPKAKPVFDAKPRWEVDRKLNKLTTR 45 

I I I I 1 I I I I I I I I I I I I I ] I ! I I I I I I I I I 
orf 4 6ng PKTGVPFDGKGFPNFEKHVKYDTKLDIQELSGGGIPKAKPVFDAKPRWEVDRKLNKLTTR 217 

orf46 .pep EQVEKNVQETRRRSQSSQFKAHAQREWENKTGLDFNHFIGGDINKKGTVTGGHSLTRGDV 105 

I I I I I I I I I I I I I I I I I I I I I I I 1 I! I I I I I I I I I I I I I I I I I I I I I : I II I I I I I I I I I 
orf4 6ng EQVEKNVQETRRRSQSSQFKAHAQREWENKTGLDFNHFIGGDINKKGAVTGGHSLTRGDV 27 7 

orf46.pep RVIQQTSAPDKHGXLSSDSGN 126 

I I I I I I I M I II I I I I I I I I 
orf4 6ng RVIQQTSAPDKHGVLSSDSGN 2 98 

A partial ORF46ng nucleotide sequence <SEQ ID 46 1> is predicted to encode a protein having 
partial amino acid sequence <SEQ ID 462>: 

1 . . RRLKHCCHAR LGSAFHRKQD GAHQRFGRYG ATQRLCRSSH PRLGSPKPQC 

51 RTRHRSRQQY LYGSHPHQRD WSCPGKIQLG RHHGTSCRAV ADXRDR ICER 

101 EIRRQRQXCR CRLGKIPSLS IPKYPLKLEQ RYGKENITSS TVPPSNGKNV 

151 KLADQRHPKT GVPFDGKGFP NFEKHVKYDT KLDIQELSGG GIPKAKPVFD 
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201 AKPRWEVDRK LNKLTTREQV EKNVQETRRR SQSSQFKAHA QREWENKTGL 

251 DFNHFIGGDI NKKGAVTGGH SLTRGDVRVI QQTSAPDKHG VLSSDSGN* 

Further work revealed the complete gonococcal DNA sequence <SEQ ID 463>: 



1 


TTGGGCATTT 


CCCGCAAAAT 


51 


CCTGCCGATG 


CATGCACACG 


101 


GgCaggttcT 


CGaccGTCAG 




TTcggCaGCA 


GGGGGGAGCT 


201 


aaacaTAcaa 


Agccatcagt 


251 


ttgaaggaaA 


TAtcgGctac 






ccttcGAcaa 


351 


CGGTAGTCCC 


GTTGACGGAT 


401 


ACGAACACCA 


TCCCGCCGAC 


451 


CCCGCTCCCA 


AAGGCGCGAG 


501 


TGCCCAAAAT 


ATCCGCCTCA 


551 


GGCTTGCCGA 


CCGTTTCCAC 


601 


GGCGACGGAT 


TCAAACGCGC 


651 


GGGCAATGCc 


gccGAAGCCT 




TCATCGGCGC 


GGCAGGAGAA 


751 


ATAAGCGAAG 


GCTCAAACAT 


801 


C AC C G AAAAC 


AAGATGGCGC 




T CAAAGACTA 


TGCCGCAGCA 




AATGCCGCAC 


7AAGG CAT AGA 


951 


CCCCATCAAA 


GGGATTGGAG 




TCACGGCACA 


TCCTGTCAAG 


1051 


AAAGGGAAAT 


CCGCCGTCAG 


1101 


ATACCCGTCC 


CCTTACCATT 


1151 


GTTACGGCAA 


AGAAAACATC 


1201 


AAAAATGTCA 


AACTGGCAGA 


1251 


TGACGGTAAA 


GGGTTTCCGA 


1301 


AGCTCGATAT 


TCAAGAATTA 


1351 


GTGTTTGATG 


CGAAACCGAG 


1401 


GACAACT CGT 


GAGCAGGTGG 


1451 


GTCAGAGTAG 


TCAGTTTAAA 


1501 


ACAGGGTTAG 


ATTTTAATCA 


1551 


CACAGTAACA 


GGAGGG CAT A 


1601 


AACAAACCTC 


GGCACCTGAT 


1651 


ATTAAAAAGC 


CTGATGGAAG 


1701 


AGTGATGACC 


AAGCACACCA 


1751 


TTAGGGCTGA 


AGTTACTTCG 


1801 


AATAAATGGC 


AGGGTACAAG 


1851 


CGAACCTAAT 


AGAACAGCAT 



ATCCCTTATT 
CCTCAGATTT 
CATTTCGaac 
TgccnagcGC 
tGggccacct 
attgtccgct 
ccaTGCCTCA 
TCAGCCTTTA 
GGCTATGACG 
GGATATATAC 
ACCTGACCGA 
AATGCCGGCG 
CACCCGATAC 
TCAACGGCAC 
ATTGTCGGCG 
TGCTGTCATG 
GCATCAACGA 
GCCATCCGCG 
AGCCGTCAGC 
CTGTCCGGGG 
CGGTCGCAGA 
CGACAATTTT 
CCCGAAATAT 
ACCTCCTCAA 
CCAACGCCAC 
ATTTTGAGAA 
TCGGGGGGCG 
ATGGGAGGTT 
AGAAAAATGT 
GCCCATGCGC 
TTTTATAGGT 
GTCTAACCCG 
AAACATGGGG 
TTGGGAGGTG 
TGTTCCCAAA 
GCTTGGGAAA 
TAAATCGGGT 
ATCCCATTTA 



CTGTCCATAC 
GGcaAACGAT 
ccgacggGAa 
aacggccATa 
gatgattcaa 
tttccgatca 
CATTCCGATT 
CCGCATCCAT 
GGCCACAGGG 
AGCTACGACA 
CAACCGCAGC 
CTATGCTGAC 
AGCCCCGAGC 
TGCAGATATC 
CAGGCGATGC 
CACGGCTTGG 
TTTGGCAGAT 
ATTGGGCAGT 
AATATCTTTA 
AAAATACGGC 
TGGGCGCGAT 
GCCGATGCGG 
CCGTTCAAAC 
CCGTGCCGCC 
CCGAAGACAG 
GCACGTGAAA 
GTATACCTAA 
GATAGGAAGC 
TCAGGAAACG 
AACGAGAATG 
GGT GAT AT C A 
TGGTGATGTA 
TTTATCAAGC 
AAAAC GAAAA 
AGATTGGGAT 
GTAGAATAAT 
ATTAAAATAG 
TGAATAG 



TGGCAGTGTG 
CCCTTTATCC 
ATACCaCCTA 
tcggattggG 
caggcggccg 
cgggcacaaa 
CTGACGAAGC 
TGGGACGGAT 
CGGCGGCTAT 
TAAAAGGCGT 
ACCGGACAAC 
GCAAGGAGTA 
TGGACAGATC 
GT CAAAAAC A 
CGTGCagGGT 
GTCTGCTTTC 
ATGGCGCAAC 
CCAAAACCCC 
TGGCAGCCAT 
TTGGGCGGCA 
CGCATTGCCG 
CATACGCCAA 
TTGGAGCAGC 
GTCAAACGGC 
GCGTACCGTT 
TATGATACGA 
GGCTAAGCCT 
TTAATAAATT 
AGAAGAAGGA 
GGAAAATAAA 
ATAAGAAAGG 
CGGGTGATAC 
GACAGTGGAA 
AAGGTGGGAA 
GAGGCTAGAA 
GCTTAAGGAT 
AAGG AT T T AC 



This corresponds to the amino acid sequence <SEQ ID 464; ORF46ng-l>: 



LGISRKISLI LSILAVCLPM 



101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



FGSRGELAXR 
FHSPFDNHAS 
PAPKGARDIY 
GDGFKRATRY 
ISEGSNIAVM 
NAAQGIEAVS 
KGKSAVSDNF 
KNVKLADQRH 
VFDAKPRWEV 
TGLDFNHFIG 
IKKPDGSWEV 
NKWQGTSKSG 



NGHIGLGNIQ 
HSDSDEAGSP 
SYDIKGVAQN 
SPELDRSGNA 
HGLGLLSTEN 
NIFMAAIPIK 
ADAAYAKYPS 
PKTGVPFDGK 
DRKLNKLTTR 
GDINKKGTVT 
KTKKGGKVMT 
IKIEGFTEPN 



HAHA SDLAND 
SHQLGHLMIQ 
VDGFSLYRIH 
IRLNLTDNRS 
AEAFNGTADI 
KMARINDLAD 
GIGAVRGKYG 
PYHSRNIRSN 
GFPNFEKHVK 
EQVEKNVQET 
GGHSLTRGDV 
KHTMFPKDWD 
RTAYPIYE* 



PFIRQVLDRQ 
QAAVEGNIGY 
WDGYEHHPAD 
TGQRLADRFH 
VKN I I GAAGE 
MAQLKDYAAA 
LGGITAHPVK 
LEQRYGKENI 
YDTKLDIQEL 
RRRSQSSQFK 
RVIQQTSAPD 
EARIRAEVTS 



HFEPDGKYHL 
IVRFSDHGHK 
GYDGPQGGGY 
NAGAMLTQGV 
IVGAGDAVQG 
AIRDWAVQNP 
RSQMGAIALP 
TSSTVPPSNG 
SGGGIPKAKP 
AHAQREWENK 
KHGVYQATVE 
AWESRIMLKD 



ORF46ng-l and ORF46-1 show 94.7% identity in 227 aa overlap: 



orf 4 6-1 -pep 



AVCLPMHAHASXLANDSFIRQVLDRQHFEPDGKYHLFGSRGELAER 

I I I I I I I I I I I MM I I I I M M M I M M M M M M M M I 
LGISRKISLILSILAVCLPMHAHASDLANDPFIRQVLDRQHFEPDGKYHLFGSRGELAXR 
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orf 4 6-1 pep QSHIGLGKIQSHQLGNLMIQQAAIKGNIGYIVRFSDHGHEVHSPFDNHASHSDSDEAGSP 
: : | | | | | : I I I I I I I : I 1 1 I I I 1 : : I I I 1 II I I 1 I I I I 1 : I I I I I I I I I I I I I I II II I 
orf4 6ng-l NGHIGLGNIQSHQLGHLMIQQAAVEGNIGYIVRFSDHGHKFHSPFDNHASHSDSDEAGSP 
70 80 90 100 110 120 

5 110 120 130 140 150 160 

or f 4 6-1 pep VDGFSLYRIHWDGYEHHPADGYDGPQGGGYPAPKGARDIYSYDIKGVAQNIRLNLTDNRS 

i | i I I I I I I I I I II M I I I I I I I I M M I I I I I II I II I I I I I II II II I I I I I i 1 I I I I 
orf4 6ng-l VDGFSLYRIHWDGYEHHPADGYDGPQGGGYPAPKGARDIYSYDIKGVAQNIRLNLTDNRS 
10 130 140 150 160 170 180 

170 180 190 200 210 220 

orf 4 6-1. pep TGQRLADRFHNAGSMLTQGVGDGFKRATRYSPELDRSGNAAEAFNGTADIVKNIIGAAGE 
II I I I I I I I I I I I : I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I 
15 orf 4 6ng-l TGQRLADRFHNAGAMLTQGVGDGFKRATRYSPELDRSGNAAEAFNGTADIVKNIIGAAGE 

190 200 210 220 230 240 

orf 46-1. pep I 

20 I 

orf4 6ng-l 

Homology with a predicted ORF from N. meningitidis (strain A) 
25 ORF46ng-l shows 87.4% identity over a 486aa overlap with an ORF (ORF46a) from strain A of 
N. meningitidis: 

10 20 30 40 50 60 

orf 4 6a . pep LGISRKISLILSILAVCLPMHAHASDLANDSFIRQVLDRQHFEPDGKYHLFGSRGELAER 
I I I I I I I I I M II I I I I I II I I II I I I I II 1 II I I I I I I I I I I I I I M M II I I I II I 
30 or f 4 6ng-l LGISRKISLILSILAVCLPMHAHASDLANDPFIRQVLDRQHFEPDGKYHLFGSRGELAXR 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 4 6a . pep SGHIGLGNIQSHQLGNLFIQQAAIKGNIGYIVRFSDHGHEVHSPFDNHASHSDSDEAGSP 
35 : I I I I I I I I I i I I I I : I : I I I I I : : I I I II I II I I I I I I : I I I I I I I I 11 M I 1 I I I II 

orf4 6ng-l NGHIGLGNIQSHQLGHLMIQQAAVEGNIGYIVRFSDHGHKFHSPFDNHASHSDSDEAGSP 

70 80 90 100 110 120 

130 140 150 160 170 180 

40 or f 4 6a . pep VDGFSLYRIHWDGYEHHPADGYDGPQGGGYPAPKGARDIYSYDIKGVAQNIRLNLTDNRS 

I II I I I I I I I I I II I I I I I I I I I I II I I I I I I M II I I I I I I I I I I I I I I I I I I I II I I I 
orf4 6ng-l VDGFSLYRIHWDGYEHHPADGYDGPQGGGYPAPKGARDIYSYDIKGVAQNIRLNLTDNRS 
130 140 150 160 170 180 

45 190 200 210 220 230 240 

orf 4 6a . pep TGQRLVDRFHNTGSMLTQGVGDGFKRATRYSPELDRSGNAAEAFNGTADIVKNIIGAAGE 
I I I I I : I I I II : I : I I I I I I M I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 
orf4 6ng-l TGQRLADRFHNAGAMLTQGVGDGFKRATRYSPELDRSGNAAEAFNGTADIVKNIIGAAGE 
190 200 210 220 230 240 

50 

250 260 270 280 290 300 

orf 4 6a . pep IVGAGDAVQGISEGSNIAVMHGLGLLSTENKMARINDLADMAQLKDYAAAAIRDWAVQNP 
I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I i II II M I I I 
orf 4 6ng-l IVGAGDAVQGISEGSNIAVMHGLGLLSTENKMARINDLADMAQLKDYAAAAIRDWAVQNP 
55 250 260 270 280 290 300 

310 320 330 340 350 360 

orf 4 6a . pep NAAQGIEAVSNIFTAVIPVKGIGAVRGKYGLGGITAHPVKRSQMGE IALPKGKSAVSDNF 

I I I II I I I II I I I I : I I : I M I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I II I I I i I i I 
60 orf 4 6ng-l NAAQGIEAVSNIFMAAIPIKGIGAVRGKYGLGGITAHPVKRSQMGAIALPKGKSAVSDNF 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 4 6a . pep ADAAYAKYPSPYHSRNIRSNLEQRYGKENITSSTVPPSNGKNVKLANKRHPKTKVPFDGK 
65 I I I I I I I I I I I I I I I I I I I I II I I I I I M I I I I I I I I I I I I I M I I :: I I I I I I I M II 

orf4 6ng-l ADAAYAKYPSPYHSRNIRSNLEQRYGKENITSSTVPPSNGKNVKLADQRHPKTGVPFDGK 
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370 380 390 400 410 420 

430 440 450 460 470 

GFPNFEKDVKYDTRINTAVPQVN PIDEPVFN — PKGSVGSAHSWSITARIQYAKLP 

I I Mill::: : ::: I :|||: I: I = ::hl I I 
GFPNFEKHVKYDTKLD— IQELSGGGIPKAKPVFDAKPRWEVDRKLN-KLTTREQVEKNV 
430 440 450 460 470 



480 490 500 510 520 530 

orf4 6a.pep RQGRIRYIPPKNYSPSAPLPKGPNNGYLDKFGNEWTKGPSRTKGQEFEWDVQLSKTGREQ 
:: I I 

o r f 4 6ng- 1 QETRRRSQS SQFKAHAQREWENKTGLDFNHFI GGD INKKGTVTGGHSLTRGDVRVI QQT S 

480 490 500 510 520 530 

The complete length ORF46a DNA sequence <SEQ ID 465> is: 

1 TTGGGCATTT CCCGCAAAAT ATCCCTTATT CTGTCCATAC TGGCAGTGTG 

51 CCTGCCGATG CATGCACACG CCTCAGATTT GGCAAACGAT TCTTTTATCC 

101 GGCAGGTTCT CGACCGTCAG CATTTCGAAC CCGACGGGAA ATACCACCTA 

151 TTCGGCAGCA GGGGGGAACT TGCCGAGCGC AGCGGTCATA TCGGATTGGG 

201 AAACATACAA AGCCATCAGT TGGGCAACCT GTTCATCCAG CAGGCGGCCA 

251 TTAAAGGAAA TATCGGCTAC ATTGTCCGCT TTTCCGATCA CGGGCACGAA 

301 GTCCATTCCC CCTTCGACAA CCATGCCTCA CATTCCGATT CTGATGAAGC 

351 CGGTAGTCCC GTTGACGGAT TCAGCCTTTA CCGCATCCAT TGGGACGGAT 

401 ACGAACACCA TCCCGCCGAC GGCTATGACG GGCCACAGGG CGGCGGCTAT 

451 CCCGCTCCCA AAGGCGCGAG GGATATATAC AGCTACGACA TAAAAGGCGT 

501 TGCCCAAAAT ATCCGCCTCA ACCTGACCGA CAACCGCAGC ACCGGACAAC 

551 GGCTTGTCGA CCGTTTCCAC AATACCGGTA GTATGCTGAC GCAAGGAGTA 

601 GGCGACGGAT TCAAACGCGC CACCCGATAC AGCCCCGAGC TGGACAGATC 

651 GGGCAATGCC GCCGAAGCTT TCAACGGCAC TGCAGATATC GTCAAAAACA 

7 01 TCATCGGCGC GGCAGGAGAA ATTGTCGGCG CAGGCGATGC CGTGCAGGGT 

7 51 ATAAGCGAAG GCTCAAACAT TGCTGTTATG CACGGCTTGG GTCTGCTTTC 

801 CACCGAAAAC AAGATGGCGC GCATCAACGA TTTGGCAGAT ATGGCGCAAC 

851 TCAAAGACTA TGCCGCAGCA GCCATGCGCG ATTGGGCAGT CCAAAACCCC 

901 AATGCCGCAC AAGGCATAGA AGCCGTCAGC AATATCTTTA CGGCAGTCAT 

951 CCCCGTCAAA GGGATTGGAG CTGTTCGGGG AAAATACGGC TTGGGCGGCA 

1001 TCACGGCACA TCCTGTCAAG CGGTCGCAGA TGGGCGAGAT CGCATTGCCG 

1051 AAAGGGAAAT CCGCCGTCAG CGACAATTTT GCCGATGCGG CATACGCCAA 

1101 ATACCCGTCC CCTTACCATT CCCGAAATAT CCGTTCAAAC TTGGAGCAGC 

1151 GTTACGGCAA AGAAAACATC ACCTCCTCAA CCGTGCCGCC GTCAAACGGA 

1201 AAGAATGTGA AACTGGCAAA CAAACGCCAC CCGAAGACCA AAGTGCCGTT 

1251 TGACGGTAAA GGGTTTCCGA ATTTTGAAAA AGACGTAAAA TACGATACGA 

1301 GAATTAATAC CGCTGTACCA CAAGTGAATC CTATAGATGA ACCCGTCTTT 

1351 AATCCTAAAG GTTCTGTCGG ATCGGCTCAT TCTTGGTCTA TAACTGCCAG 

14 01 AATTCAATAC GCAAAATTAC CAAGGCAAGG TAGAATCAGA TATATCCCAC 

1451 CTAAAAATTA CTCTCCTTCA GCACCGCTAC CAAAAGGACC TAATAATGGA 

1501 TATTTGGATA AATTTGGTAA TGAATGGACT AAAGGTCCAT CAAGAACTAA 

1551 AGGTCAAGAA TTTGAATGGG ATGTTCAATT GTCTAAAACA GGAAGAGAGC 

1601 AACTTGGATG GGCTAGTAGG GATGGTAAGC ATTTAAATAT ATCAATTGAT 

1651 GGAAAGATTA C AC ACAAAT G A 

This corresponds to the amino acid sequence <SEQ ID 466>: 



1 LGISRKISLI LSILAVCLPM HAHA SDLAND SFIRQVLDRQ HFEPDGKYHL 

51 FGSRGELAER SGHIGLGNIQ SHQLGNLFIQ QAAIKGNIGY IVRFSDHGHE 

101 VHSPFDNHAS HSDSDEAGSP VDGFSLYRIH WDGYEHHPAD GYDGPQGGGY 

151 PAPKGARDIY SYDIKGVAQN IRLNLTDNRS TGQRLVDRFH NTGSMLTQGV 

2 01 GDGFKRATRY SPELDRSGNA AEAFNGTADI VKNIIGAAGE IVGAGDAVQG 

251 ISEGSNIAVM HGLGLLSTEN KMARINDLAD MAQLKDYAAA AIRDWAVQNP 

301 NAAQGIEAVS NIFTAVIPVK GIGAVRGKYG LGGITAHPVK RSQMGEIALP 

351 KGKSAVSDNF ADAAYAKYPS PYHSRNIRSN LEQRYGKENI TSSTVPPSNG 

4 01 KNVKLANKRH PKTKVPFDGK GFPNFEKDVK YDTRINTAVP QVNPIDEPVF 

4 51 NPKGSVGSAH SWSITARIQY AKLPRQGRIR YIPPKNYSPS APLPKGPNNG 

501 YLDKFGNEWT KGPSRTKGQE FEWDVQLSKT GREQLGWASR DGKHLNISID 

551 GKITHK* 
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Based on this analysis, including the presence of a RGD sequence in the gonococcal protein, typical 
of adhesins, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 56 

The following partial DNA sequence was identified m.N. meningitidis <SEQ ID 467>: 

1 ATGAATATTC ACACCCTGCT CTCCAAACAA TGGACGCTGC CGCCATTCCT 

51 GCCGAAACGG CTGCTGCTGT CCCTGCTGAT ACTGCTTGCC CCCAATGCGG 

101 TGTTTTGGGT TTTGGCACTG CTGACCGCCA CCGCCCGCCC GATTGTCAAT 

151 TTGGACTATC TTCCCGCCGC GCTGCTGATC GCCCTGCCTT GGCGTTTCGT 

201 CAAAATTGCC GGCGTATTGG CGTTTTGGCT GGCGGTTTTG TTTGACGGGC 

251 TGATGATGGT GATCCAACTC TTCCCTTTTA TGGATCTCAT CGGCGCCATC 

301 AACCTCGTCC CCTTCATCCT GACCGCCCCC GCCCCTTATC AGATAATGAC 

351 CGGGCTG... 

This corresponds to the amino acid sequence <SEQ ID 468; ORF48>: 

1 MNIHTLLSKQ WTLPPFLPKR LLLSLLILLA PNAVFWVLAL LTATARPIVN 

51 LDYLPAALLI ALPWRFVKIA GVLAFWLAVL FDGLMMVIQL FPFMDLIGAI 

101 NLVPFILTAP APYQIMTGL. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 469>: 

1 ATGAATATTC ACACCCTGCT CTCCAAACAA TGGACGCTGC CGCCATTCCT 

51 GCCGAAACGG CTGCTGCTGT CCCTGCTGAT ACTGCTTGCC CCCAATGCGG 

101 TGTTTTGGGT TTTGGCACTG CTGACCGCCA CCGCCCGCCC GATTGTCAAT 

151 TTGGACTATC TTCCCGCCGC GCTGCTGATC GCCCTGCCTT GGCGTTTCGT 

201 CAAAATTGCC GGCGTATTGG CGTTTTGGCT GGCGGTTTTG TTTGACGGGC 

251 TGATGATGGT GATCCAACTC TTCCCTTTTA TGGATCTCAT CGGCGCCATC 

301 AACCTCGTCC CCTTCATCCT GACCGCCCCC GCCCCTTATC AGATAATGAC 

351 CGGGCTGTTG CTGCTGTATA TGCTGGCGAT GCCGTTTGTG TTGCAGAAAG 

401 CCGCCGCCAA AACCGACTTC CGGCACATTG CCGTCTGCGC CGCCGTTGTG 

451 GCGGCAGCCG GCTATTTCAC CGGCCATTTG AGTTACTACG ACCGGGGTCG 

501 GATGGCCAAT ATCTTCGGCG CAAACAACTT CTACTACGCC AAAAGTCAGG 

551 CGATGCTCTA CACCGTCAGC CAGAATGCCG ACT TT AT T AC CGCCGGCCTG 

601 GTCGATCCCG TCTTCCTCCC CTTGGGCAAT CAACAGCGTG CCGCCACGCA 

651 TCTGAACGAG CCGAAATCTC AAAAAATCCT CTTTATCGTC GCCGAATCTT 

7 01 GGGGGCTGCC GGCCAATCCC GAACTTCAAA ACGCCACTTT TGCCAAACTG 

751 CTGGCGCAAA AAGACCGTTT TTCGGTTTGG GAAAGCGGCA GTTTTCCCTT 

801 CATCGGCGCG ACGGTCGAAG GCGAAATGCG CGAACTGTGT GCCTACGGCG 

851 GTTTGCGCGG GTTCGCACTG CGCCGCGCGC CCGACGAAAA ATTTGCCCGC 

901 TGCCTCCCCA ACCGTTTGAA ACAAGAAGGT TACGCCACCT TTGCGATGCA 

951 CGGCGCGGGC AGTTCGCTTT ACGACCGCTT CAGCTGGTAT CCGAGGGCGG 

1001 GCTTTCAAGA AATCAAAACC GCCGAAAACC TGATCGGTAA AAAAACCTGC 

1051 GCCATTTTCG GCGGCGTGTG CGACAGCGAG CTGTTCGGCG AAGTGTCGGC 

1101 ATTTTTCAAA AAACACGACA AGGGACTGTT TTACTGGATG ACGCTGACCA 

1151 GCCACGCCGA CTATCCCGAA TCCGACATTT T C AAC C AC AG GCTCAAATGC 

12 01 ACCGAAT AT G GCCTGCCCGC CGAAACCGAC CTCTGCCGCA ATTTCAGCCT 

1251 GCACACCCAA TTCTTCGACC AACTGGCGGA TTTGATCCAA CGCCCCGAAA 

1301 TGAAAGGCAC GGAAGTCATC ATCGTCGGCG ACCATCCGCC GCCCGTCGGC 

1351 AACCTCAATG AAACCTTCCG CTACCTCAAA CAGGGGCACG TCGCCTGGCT 

14 01 GAACTTCAAA ATCAAATAA 

This corresponds to the amino acid sequence <SEQ ID 470; ORF48-l>: 



1 MNIHTLLSKQ WTLPPFLPKR LLLSLLILLA PNAVFWVLAL LTATA RPIVN 

51 LDYLPAALLI ALPWRFVKIA G VLAFWLAVL FDGLMMVI Q L FPFMDLIGAI 

101 NLVPFI LTAP APYQ IMTGLL LLYMLAMPFV L QKAAAKTDF RHIAVCAAW 

151 AAAGYFTG HL SYYDRGRMAN IFGANNFYYA KSQAMLYTVS QNADFITAGL 

201 VDPVFLPLGN QQRAATHLNE PKSQKILFIV AESWGLPANP ELQNATFAKL 

251 LAQKDRFSVW ESGSFPFIGA TVEGEMRELC AYGGLRGFAL RRAPDEKFAR 

301 CLPNRLKQEG YAT FAMHGAG SSLYDRFSWY PRAGFQEIKT AENLIGKKTC 
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351 AIFGGVCDSE LFGEVSAFFK KHDKGLFYWM TLTSHADYPE SDIFNHRLKC 
4 01 TEYGLPAETD LCRNFSLHTQ FFDQLADLIQ RPEMKGTEVI IVGDHPPPVG 
451 NLNETFRYLK QGHVAWLNFK IK* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF48 shows 94.1% identity over a 119aa overlap with an ORF (ORF48a) from strain A of N. 



10 20 30 40 50 60 

MNIHTLLSKQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 

I | | I I 1 I I I I I II I I I I I I I I I I I I I I i I I I I I I 1 I M I I I I ! I I I I I > I IIMMM 
MNIHTLLSKQWTLPPFLPKRLLLSLLILLXPNAVFWVLALLTATARPIVNLXYLPAALLI 
10~ 20 30 40 50 60 

70 80 90 100 110 119 

AL PWRFVK IAG VLAFWLAVLFDGLMMVI Q LFPFMDLIGAINLVPFI LTAPAPYQIMTGL 
| | M | Ml I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I 
ALPWRXVKIXG VLAXWLAVLFDGLMMVI Q LFPFMDLIGAINLVPFI XTAPALYQIMTGLL 
70 80 90 100 110 120 

LL YMLAMP FVLQKAAAKTDFRHIAACAAWVAAGYFTGHLSXYDRGRMAN I FGANN FYYA 
130 140 150 160 170 180 

The complete length ORF48a nucleotide sequence <SEQ ID 471> is: 

1 ATGAATATTC ACACCCTGCT CTCCAAACAA TGGACGCTGC CGCCATTCCT 

51 GCCGAAACGG CTGCTGCTGT CCCTGCTGAT ACTGCTNNCC CCCAATGCGG 

101 TGTTTTGGGT TTTGGCACTG CTGACCGCCA CCGCCCGCCC GATTGTCAAT 

151 TTGGANTACC TTCCCGCCGC GCTGCTGATC GCCCTGCCTT GGCGTNTCGT 

201 CAAAATTGNC GGCGTATTGG CGTNTTGGCT GGCGGTTTTG TTTGACGGGC 

251 TGATGATGGT GATCCAACTC TTCCCTTTTA T GGAT CT CAT CGGCGCCATC 

301 AACCTCGTCC CCTTCATCNT GACCGCCCCC GCCCTTTATC AGATAATGAC 

351 CGGGCTGTTA CTGCTGTATA TGCTGGCGAT GCCGTTTGTG TTGCAGAAAG 

4 01 CCGCCGCCAA AACCGACTTC CGACACATTG CCGCCTGTGC CGCCGTTGTG 

451 GTGGCAGCCG GCTATTTTAC CGGCCATTTG AGTTANTACG ACCGGGGGCG 

501 GATGGCCAAT ATCTTCGGCG CAAACAACTT CTATTACGCC AAAAGTCAGG 

551 CGATGCTCTA CACCGTCAGC CAGAATGCCG ACTTTATTAC CGCCGGCCTG 

601 GTCGATCCCG TCTTCCTCCC CTTGGGCAAT CAACAGCGTG CCGCCACGCA 

651 TCTGAACGAG CCGAAATCTC AAAAAATCCT CTTTATCGTC GCCGAATCTT 

7 01 GGGGGCTGCC GGCCAATCCC GAACTTCAAA ACGCCACTTT TGCCAAACTG 
751 CTGGCGCAAA AAGANCGTTT TTCGGTTTGG GAAAGCGGCA GTTTTCCCTT 

8 01 CATCGGCGCG ACGATCGAAG GCGAAATGCG CGAACTGTGT GCCTACGGCG 
8 51 GTTTGCGCGG GTTCGCACTG CGCCGCGCGC CCGACGAAAA ATTTGCCCGC 
901 TGCCTCCCCA ACCGTTTGAA ACAAGAAGGT TACGCCACCT TTGCGATGCA 
951 CGGCGCGGGC AGTTCGCTTT ACGACCGCTT CAGCTGGTAT CCGAGGGCGG 

1001 GCTTTCAAGA AATCAAAACC GCCGAAAACC TGATCGGTAA AAAAACCTGC 

1051 GCCATTTTCG GCGGCGTGTG CGACAGCGAG CTGTTCGGCG AAGTGTCGGC 

1101 ANTTTTCAAA AAACACGACA AGGGACTGTT TTACTGGATG ACGCTGACCA 

1151 GCCACGCCGA CTATCCCGAA TCNGACATTT TCAACCACAG GCTCAAATGC 

1201 ACCGAATATG GCCTGCCCGC CGAAACCGAC NTCTGCCGCA ATTTCAGCCT 

1251 GCACACCCAA TTCTTCGACC AACTGGCGGA TTTGATCCAA CGCCCCGAAA 

1301 TGAAAGGCAC GGAAGTC AT C ATCGTCGGCG ACCATCCGCC GCCCGTCGGC 

1351 AACCTCAATG AAACCTTCCG CTACCTCAAA CAGGGGCACG TCGNCTGGCT 

14 01 GAACTTCAAA ATCAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 472>: 

1 MNIHTLLSKQ WTLPPFLPKR LLLSLLILLX PNAVFWVLAL LTATA RPIVN 

51 LXYLPAALLI ALPWRXVKIX G VLAXWLAVL FDGLMMVI Q L FPFMDLIGAI 

101 NLVPFIXTAP ALYQ IMTGLL LLYMLAMPFV L QKAAAKTDF R HIAACAAW 

151 VAAGYFTG HL SXYDRGRMAN IFGANNFYYA KSQAMLYTVS QNADFITAGL 

2 01 VDPVFLPLGN QQRAATHLNE PKSQKILFIV AESWGLPANP ELQNATFAKL 

251 LAQKXRFSVW ESGSFPFIGA TIEGEMRELC AYGGLRGFAL RRAP DE K FAR 

301 CLPNRLKQEG YATFAMHGAG SSLYDRFSWY PRAGFQEIKT AENLIGKKTC 

351 AIFGGVCDSE LFGEVSAXFK KHDKGLFYWM TLTSHADYPE SDIFNHRLKC 



meningitidis: 

orf 48 .pep 
orf 48a 

orf48 .pep 
orf48a 

orf48a 
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401 TEYGLPAETD XCRNFSLHTQ FFDQLADLIQ RPEMKGTEVI IVGDHPPPVG 
451 NLNET FRYLK QGHVXWLNFK IK* 

ORF48a and ORF48-1 show 96.8% identity in 472 aa overlap: 

10 20 30 40 50 60 

orf48a pep MNIHTLLSKQWTLPPFLPKRLLLSLLILLXPNAVFWVLALLTATARPIVNLXYLPAALLI 
I I I I I I I I I I I I I I I M ! I I I I I I I I I M I I 1 II I I I I I I I I I I M I I M I I I I II 1 I 
orf4 8-l MNIHTLLSKQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf48a pep ALPWRXVKIXGVLAXWLAVLFDGLMMVIQLFPFMDLIGAINLVPFIXTAPALYQIMTGLL 

| | | | | | M | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I II 

orf 48-1 ALPWRFVKIAGVLAFWIAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGLL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf48a pep LLYMLAMPFVLQKAAAKTDFRHIAACAAVWAAGYFTGHLSXYDRGRMANIFGANNFYYA 
| | | | [ M I I I I I I I II I I I I I I I I : II I I I : I I I I II II I I I I I I I I I I I I I M I I I M 
or f 4 8-1 LL YMLAMP FVLQKAAAKT D FRH I AVC AAWAAAGYFT GHL S YYDRGRMAN I FGANN FYYA 

130 140 150 160 170 180 

190 200 210 220 230 240 

or f 4 8a pep KSQAMLYTVSQNADFITAGLVDPVFLPLGNQQRAATHLNEPKSQKILFIVAESWGLPANP 
I I I I I M I [ I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I II 
or f 4 8-1 KSQAMLYTVSQNADFITAGLVDPVFLPLGNQQRAATHLNEPKSQKILFIVAESWGLPANP 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 4 8a . pep ELQNATFAKLLAQKXRFSVWESGSFPFIGATIEGEMRELCAYGGLRGFALRRAPDEKFAR 

I I I II I I I I I I I I I I II I I 1 I I I I I M I I I : I I I I I I I I I I I I I I I I I I I I M I 

orf 4 8-1 ELQNATFAKLLAQKDRFSVWESGSFPFIGATVEGEMRELCAYGGLRGFALRRAPDEKFAR 

250 260 270 280 290 300 

310 320 330 340 350 360 

or f 4 8a . pep CLPNRLKQEGYATFAMHGAGSSLYDRFSWYPRAGFQEIKTAENLIGKKTCAIFGGVCDSE 
' | | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I II I II I I I I I I I I I II 
orf 48-1 CLPNRLKQEGYATFAMHGAGSSLYDRFSWYPRAGFQEIKTAENLIGKKTCAIFGGVCDSE 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 4 8a . pep LFGEVSAXFKKHDKGLFYWMTLTSHADYPESDIFNHRLKCTEYGLPAETDXCRNFSLHTQ 

orf 4 8-1 LFGEVSAFFKKHDKGLFYWMTLTSHADYPE3DIFNHRLKCTEYGLPAETDLCRNFSLHTQ 
370 380 390 400 410 420 

430 440 450 460 470 

orf 4 8a. pep FFDQLADLIQRPEMKGTEVIIVGDHPPPVGNLNETFRYLKQGHVXWLNFKIKX 
I || I I I I I I I I I I M I! I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I I I 
orf 4 8-1 FFDQLADLIQRPEMKGTEVIIVGDHPPPVGNLNETFRYLKQGHVAWLNFKIKX 

430 440 450 460 470 

Homology with a predicted ORF from N .gonorrhoeae 

ORF48 shows 97.5% identity over a 119aa overlap with a predicted ORF (ORF48ng) from N. 
gonorrhoeae: 

orf 48 .pep MNIHTLLSKQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 60 

I | | | : | I I : I I I I I I I I I I I ! I I I I M I I I I I I I II II I I I I I I I ! I I I I I I I I I II I I I 
orf 4 8ng MNIHALLSEQWTLPPFL PKRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 60 

orf 48 .pep ALPWRFVKIAGVLAFWLAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGL 119 

I I I I I I I I I I I II I I I I II I I I I I I I I II I I I I I I I I I I I I I I I I I I M I II I 1 1 I I I 
orf 48ng ALPWRFVKI AGVLAFWPAVLFDGLMMVIQLFP FMDL IGAINLVP FI LTAPAPYQ IMTGLL 120 
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The ORF48ng nucleotide sequence <SEQ ID 473> was predicted to encode a protein having amino 
acid sequence <SEQ ID 474>: 

1 MNIHALLSEO WTLPPFLPKR LLLSLLILLA PNAVF WVLAL LTATARPIVN 
51 T.nVT.PA ATJ.T ALPWRFVKIA G VLAFWPAVL FDGLM MVIQ L FPFMDLIGAI 
101 NLVPFI LTAP APYQ IMTGLL LLYMLAMPFV L QKAAVKTDF RHIAVCAAVV 
151 AAARYFTGPF ELLRTGGRWQ YVQHRRLLLS GSRASFRRRQ KADVLRRLGN 
2 01 PYASMGNGG. . 

Further work identified the complete gonococcal DNA sequence <SEQ ID 475>: 

1 ATGAATATTC ACGCCCTGCT CTCCGAACAA TGGACGCTGC CGCCATTCCT 

51 GCCGAAACGG CTGCTGCTGT CCCTGCTGAT ACTGCTGGCC CCCAATGCGG 

101 TGTTTTGGGT TTTGGCACTG CTGACCGCCA CCGCCCGCCC GATTGTCAAT 

151 TTGGACTACC TTCCCGCCGC GCTGCTGATC GCCCTGCCTT GGCGTTTCGT 

201 CAAAATTGCC GGCGTATTGG CGTTTTGGCC GGCGGTTTTG TTTGACGGGC 

251 TGATGATGGT GATCCAACTC TTCCCTTTTA TGGACCTCAT CGGCGCCATC 

301 AACCTCGTCC CCTTCATCCT GACCGCCCCC GCCCCTTATC AGATAATGAC 

351 CGGGCTGTTG CTGCTGTATA TGCTGGCGAT GCCGTTTGTG TTGCAAAAAG 

401 CCGCCGTCAA AACCGACTTC CGACACATTG CCGTCTGTGC CGCCGTTGTG 

451 GCGGCAGCCG GCTATTTCAC CGGCCATTTG AGTTACTACG ACCGGGGGCG 

501 GATGGCCAAT ATCTTCGGCG CAAACAACTT CTATTACGCc a AAAGT C AGG 

551 CGATGCTCTA CACCGTCAGC CAGAATGCCG ACTTTATTAC CGCCGgcctG 

601 GTCGACCCCG TCTTCCTCCC CTTGGGCAAT CAGCAGCGTG CCGCCACGCG 

651 GCTGAGTGAG CCGAAATCTC AAAAAATCCT CTTTATCGTC GCCGAATCTT 

7 01 GGGGGCTGCC GGGCAATCCC GAGCTTCAAA ACGCCACTTT TGCCAAACTG 

751 CTGGCGCAAA AAGACCGTTT TTCGGTTTGG GAAAGCGGCA GTTTTCCCTT 

801 CATCGGCGCG ACGGTCGAAG GCGAAATGCG CGAATTGTGC GCCTACGGCG 

851 GTTTGCGCGG GTTCGCACTG CGCCGCGCGC CCGACGAAAA ATTTGCCCGC 

901 TGCCTCCCCA ACCGTTTGAA ACAAGAAGGT TACGCCACCT TTGCGATGCA 

951 CGGCGCGGGT AGTTCGCTTT ACGACCGCTT CAGCTGGTAT CCGAGGGCGG 

1001 GCTTTCAAAA AATCAAAACC GCCGAAAACC TGATCGGTAA AAAAACCTGC 

1051 GCCATTTTCG GCGGCGTGTG CGACAGCGAG CTGTTCGGCG AAGTGTCGGC 

1101 ATTTTTCAAA AAACACGACA AGGGACTGTT TTACTGGATG ACGCTGACCA 

1151 GCCACGCCGA CTATCCCGAA TCCGACATTT TCAACCACAG GCTCAAATGC 

1201 ACCGAATACG GCCTGCCCGC CGAAACCGAC CTCTGCCGCA ATTTCAGCCT 

1251 GCACACCCAA TtCttcgACC AACTGGCGGA TTTGATCCGA CGCCCCGAAA 

1301 TGAAAGGCAC GGAAGTCATC ATCGTCGGCG ACCATCCGCC GCCCGTCGGC 

1351 AACCTCAATG AAACCTTCCG CTACCTCAAA CAGGGACACG TCGCCTGGCT 

14 01 GCACTTCAAA AT CAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 476; ORF48ng-l>: 

1 MNIHALLSEQ WTLPPFLPKR LLLSLLILLA PNAVFWVLAL LTATARPIVN 

51 LDYLPAALLI ALPWRFVKIA GVLAFWPAVL FDGLMMVIQL FPFMDLIGAI 

101 NLVPFILTAP APYQIMTGLL LLYMLAMPFV LQKAAVKTDF RHIAVCAAVV 

151 AAAGYFTGHL SYYDRGRMAN IFGANNFYYA KSQAMLYTVS QNADFITAGL 

201 VDPVFLPLGN QQRAATRLSE PKSQKILFIV AESWGLPGNP ELQNATFAKL 

251 LAQKDRFSVW ESGSFPFIGA TVE GEMRELC AYGGLRGFAL RRAPDEKFAR 

301 CLPNRLKQEG YATFAMHGAG SSLYDRFSWY PRAGFQKIKT AENLIGKKTC 

351 AIFGGVCDSE LFGEVSAFFK KHDKGLFYWM TLTSHADYPE SDIFNHRLKC 

4 01 TEYGLPAETD LCRNFSLHTQ FFDQLADLIR RPEMKGTEVI IVGDHPPPVG 

4 51 NLNETFRYLK QGHVAWLHFK IK* 

ORG48ng-l and ORF48-1 show 97.9% identity in 472 aa overlap: 

10 20 30 40 50 60 

orf 4 8-1 . pep MNIHTLLSKQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 
I I I I : I I I : I I I I I II I I I I II I I 11 I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I 
orf 4 8ng-l MNIHALLSEQWTLPPFLPKRLLLSLLILLAPNAVFWVLALLTATARPIVNLDYLPAALLI 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 4 8-1. pep ALPWRFVKIAGVLAFWLAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGLL 
I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I I I I I II I I I I I I I I I 1 I M II I I I I I 
orf4 8ng-l ALPWRFVKIAGVLAFWPAVLFDGLMMVIQLFPFMDLIGAINLVPFILTAPAPYQIMTGLL 
70 80 90 100 110 120 
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orf 48-1 .pep 



20 



25 



orf 48-1. pep 
orf 48ng-l 



orf 48-1 .pep 
orf 48ng-l 



orf 48-1 .pep 
orf4 8ng-l 



130 140 150 160 170 180 

LLYMLAMPFVLQKAAAKTDFRHIAVCAAWAAAGYFTGHLSYYDRGRMMJIFGANNFYYA 
| | | ] | | | | | | | I I I I : 1 I I I I I 1 I i I I I I I I I I I M M I I II I I I 1 I I I I M I I I M I I 1 
LL YML AMP FVL QKAAVKT D FRH I AVC AAWAAAG Y FT GHL S Y Y DRGRMAN I FGANN F Y YA 

130 140 150 160 170 180 

190 200 210 220 230 240 

KSQAMLYTVSQNADFITAGLVDPVFLPLGNQQRAATHLNE PKSQKILFIVAESWGLPANP 
| | | | [ I | I I I I I I I I I I I I II I I I I I I I I I I I I I I I : I = I I I I I I I I I I I I I N I I I : I t 
KSQAMLYTVSQNADFITAGLVDPVFLPLGNQQRAATRLSEPKSQKILFIVAESWGLPGNP 

190 200 210 220 230 240 

250 260 270 280 290 300 

ELQNATFAKLLAQKDRFSVWESGSFPFIGATVEGEMRELCAYGGLRGFALRRAPDEKFAR 
I | I | I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I M I I M I I I I I I I 1 I I 
ELQNATFAKLLAQKDRFSVWESGSFPFIGATVEGEMRELCAYGGLRGFALRRAPDEKFAR 
250 260 270 280 290 300 

310 320 330 340 350 360 

CLPNRLKQEGYATFAMHGAGSSLYDRFSWYPRAGFQEIKTAENLIGKKTCAIFGGVCDSE 

I I I I I I I I I I I I Ill I I II I : II I i I I I I I I I I I I I I I I II I II 

CLPNRLKQEGYATFAMHGAGSSLYDRFSWYPRAGFQKIKTAENLIGKKTCAI FGGVCDSE 

310 320 330 340 350 360 

370 380 390 400 410 420 

LFGEVSAFFKKHDKGLFYWMTLTSHADYPESDIFNHRLKCTEYGLPAETDLCRNFSLHTQ 

I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

LFGEVSAFFKKHDKGLFYWMTLTSHADYPESDIFNHRLKCTEYGLPAETDLCRNFSLHTQ 
370 380 390 400 410 420 

430 440 450 460 470 

FFDQLADLIQRPEMKGTEVIIVGDHPPPVGNLNETFRYLKQGHVAWLNFKIKX 



Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and two putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



40 Example 57 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 477>: 

1 . . GTGAGCGGAC GTTACCGCGC TTTGGATCGC GTTTCCAAAA TCATCATCGT 

51 TACTTTGAGT ATCGCCACGC TTGCCGCCGC CGGCATCGCT ATGTCGCGCG 

101 GTATGCAGAT GCAGTCCGAT TTTATCGAGC CGACACCGTG GACGCTTGCC 

45 151 GGTTTGGGCT TCCTGATCGC GCTGATGGGC TGGATGCCCG CGCCGATTGA 

2 01 AATTTCCGCC ATCAATTCTT TGTGGGTAAC CGAAAAACAA CGCATCAATC 
251 CTTCCGAATA CCGCGACGGG ATTTTTGAAT TCAACGTCGG TTATATCGCC 

3 01 AGTGCGGTTT TGGCTTTGGT TTTCCTTGCA CTGGGCGC.G TAGCGCCGAA 
351 CGGCAACGGC GA.ACAGTGC AGATGGCGGG CGGCAAATAT AACGGGCAAT 

50 4 01 TGATCAATAT GTACGCC . . 

This corresponds to the amino acid sequence <SEQ ID 478; ORF53>: 

1 . . VSGRYRALDR VSKIIIVTLS IATLAAAGIA MSRGMQMQSD FIEPTPWTLA 
51 GLGFLIALMG WMPAPIEISA INSLWVTEKQ RINPSEYRDG IFEFNVGYIA 

101 SAVLALVFLA LGXVAPNGNG XTVQMAGGKY NGQLINMYA. . 

55 Further work revealed the complete nucleotide sequence <SEQ ID 479>: 



1 ATGTCCGAAC AAC AT AT T T C GACTTGGAAA AGTAAAATCA ACGCATTGGG 
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51 TCCGGGGATC ATGATGGCTT CGGCGGCGGT CGGCGGTTCG CACCTGATTG 

101 CCTCGACGCA GGCGGGCGCG CTTTACGGCT GGCAGATCGC GCTCATCATC 

151 ATCCTGACCA ACCTCTTCAA ATACCCGTTT TTCCGCTTCA GCGCGCATTA 

201 CACGCTGGAC ACGGGCAAGA GCCTGATTGA AGGTTATGCC GAGAAAAGCC 

251 GCGTTTATTT GTGGGTATTC CTGATTTTGT GCATCCTCTC CGCCACGATT 

301 AACGCGGGCG CGGTCGCCAT TGTAACCGCC GCCATCGTCA AAATGGCGAT 

351 TCCCTCGCTG ATGTTTGATG CCGGCACGGT TGCCGCCTTG ATTATGGCAT 

401 CCTGCCTGAT TATTTTGGTG AGCGGACGTT ACCGCGCTTT GGATCGCGTT 

451 T CCAAAATCA TCATCGTTAC TTTGAGTATC GCCACGCTTG CCGCCGCCGG 

501 CATCGCTATG TCGCGCGGTA TGCAGATGCA GTCCGATTTT ATCGAGCCGA 

551 CACCGTGGAC GCTTGCCGGT TTGGGCTTCC TGATCGCGCT GATGGGCTGG 

601 ATGCCCGCGC CGATTGAAAT TTCCGCCATC AATTCTTTGT GGGTAACCGA 

651 AAAACAACGC ATCAATCCTT CCGAATACCG CGACGGGATT TTTGATTTCA 

701 ACGTCGGTTA TATCGCCAGT GCGGTTTTGG CTTTGGTTTT CCTTGCACTG 

751 GGCGCGTTTG TGCAATACGG CAACGGCGAA GCAGTGCAGA TGGCGGGCGG 

801 CAAATATATC GGGCAATTGA TCAATATGTA CGCCGTTACC ATCGGCGGCT 

851 GGTCGCGCCC GCTGGTGGCG TTTATCGCGT TTGCCTGTAT GTACGGCACG 

901 ACGATTACCG TCGTGGACGG CTATGCCCGT GCCATTGCCG AACCCGTGCG 

951 CCTGCTGCGC GGAAAAGACA AAACGGGCAA CGCCGAATTC TTTGCCT GGA 

1001 ATATTTGGGT GGCGGGCAGC GGTTTGGCGG TGATTTTCTG GTTTGACGGC 

1051 GTAATGGCGA ATCTGCTCAA ATTTGCGATG ATTGCCGCTT TTGTGTCCGC 

1101 CCCTGTGTTT GCCTGGCTGA ATTACCGTTT GGT TAAAGGT GAT G AAAAAC 

1151 ACAAACTCAC ATCAGGTATG AATGCCCTTG CATTGGCAGG CTTGATTTAT 

1201 CTGACCGGTT TTACCGTTTT GTTCTTATTG AATTTGGCGG GAATGTTCAA 

1251 ATGA 

This corresponds to the amino acid sequence <SEQ ID 480; ORF53-l>: 

1 MSEQHISTWK SKINALGPGI MMASAAVGGS HLIASTQAG A LYGWQIALII 

51 ILTNLF KYPF FRFSAHYTLD TGKSLIEGYA EKSRVYLW VF LILCILSATI 

101 NAGAV AI VTA AIVKMAIPSL M FD AGTVAAL IMASCLIILV SGRYRALDRV 

151 SK IIIVTLSI AT LAAAGI AM SRGMQMQSDF IEPTPW TLAG LGFLIALMGW 

201 MPAPIEISAI NSLWVTEKQR INPSEYRDGI FDFNVGY IAS AVLALVFLAL 

2 51 GAFVQYGNGE AVQMAGGKYI GQLINMYAVT IGGWSRPL VA FIAFACMYGT 

3 01 TITW DGYAR AIAEPVRLLR GKDKTGNAE F FAWNIWVAGS GLAVIF WFDG 
351 VMAN LLKFAM IAAFVSAPVF A WLNYRLVKG DEKHKLTSGM N ALALAGLIY 

4 01 LTGFTVLFLL NLAGMFK* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF53 shows 93.5% identity over a 139aa overlap with an ORF (ORF53a) from strain A of N. 
meningitidis: 

10 20 30 

orf53.pep V S GR YRALD RV SK IIIVTLS I AT LAAAG I A 

I I I I I I I I I I I I I I I I I I i II I I I I I I I I I 
orf53a AAIVKMAIPSL MFD AGTVAALIMASCLIILV SGRYRALDRVSK IIIVTLSIATLAAAGIA 
110 120 130 140 150 160 

40 50 60 70 80 90 

or f 53 .pep MSRGMQMQSDFIEPTPW TLAGLGFLIALMGWMPA PIEISAINSLWVTEKQRINPSEYRDG 
I I I I I I I I I I I I I I I M I I I I I I I I 1 I I I I I I I I I I ! I I I M i I I I I I I I I I I I I I I I I I 
or f 53a MSRGMQMQSDFIEPTPW TLAGLGFLIALMGWMPA PIEISAINSLWVTEKQRINPSEYRDG 
170 180 190 200 210 220 

100 110 120 130 139 

orf53 .pep I FE FNVGY IASAVLALVFLALGXVA PNGNGXTVQMAGGKYNGQLINMYA 
I I : I M I I I I 1 I I I I I I I I I I I : IN : I I I I I I I I Mill!!! 
orf53a IFDFNVGY IASAVLALVFLALGAFV QYGNGEAVQMAGGKYIGQLINMYAVTIGGWSRPLV 
230 240 250 260 270 280 



60 The complete length ORF53a nucleotide sequence <SEQ ID 48 1> is: 
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1 ATGTCCGAAC AACATATTTC GACTTGGAAA AGTAAAATCA ACGCATTGGG 

51 ACCGGGGATT ATGATGGCTT CGGCGGCGGT CGGCGGTTCG CACCTGATTG 

101 CCTCGACGCA GGCGGGCGCG CTTTACGGCT GGCAGATCGC GCTCATCATC 

151 ATCCTGACCA ACCTCTTCAA ATACCCGTTT TTCCGCTTCA GCGCGCATTA 

201 CACGCTGGAC ACGGGCAAGA GCCTGATTGA AGGTTATGCC GAGAAAAGCC 

251 GCGTTTATTT GTGGGTATTC CTGATTTTGT GCATCCTCTC CGCCACGATT 

301 AACGCGGGCG CGGTCGCCAT TGTAACCGCC GCCATCGTCA AAATGGCGAT 

351 TCCCTCGCTG ATGTTTGATG CCGGCACGGT TGCCGCCTTG ATTATGGCAT 

4 01 CCTGCCTGAT TATTTTGGTG AGCGGACGTT ACCGCGCTTT GGATCGCGTT 

4 51 TCCAAAATCA TCATCGTTAC TTTGAGTATC GCCACGCTTG CCGCCGCCGG 

501 CATCGCTATG TCGCGCGGTA TGCAGATGCA GTCCGATTTT ATCGAGCCGA 

551 CACCGTGGAC GCTTGCCGGT TTGGGCTTCC TGATCGCGCT GATGGGCTGG 

601 ATGCCCGCGC CGATT GAAAT TTCCGCCATC AATTCTTTGT GGGTAACCGA 

651 AAAACAACGC ATCAATCCTT CCGAATACCG CGACGGGATT TTTGATTTCA 

7 01 ACGTCGGTTA TATCGCCAGT GCGGTTTTGG CTTTGGTTTT CCTTGCACTG 

7 51 GGCGCGTTTG TGCAATACGG CAACGGCGAA GCAGTGCAGA TGGCGGGCGG 

801 CAAATATATC GGGCAATTGA TCAATATGTA CGCCGTTACC ATCGGCGGCT 

851 GGTCGCGCCC GCTGGTGGCG TTTATCGCGT TTGCCTGTAT GTACGGCACG 

901 ACGATTACCG TTGTGGACGG CTATGCCCGT GCCATTGCCG AACCCGTGCG 

951 CCTGCTGCGC GGAAAAGACA AAACGGGCAA CGCCGAATTC TTTGCCTGGA 

1001 ATATTTGGGT GGCGGGCAGC GGTTTGGCGG TGATTTTCTG GTTTGACGGC 

1051 GTAATGGCGA ATCTGCTCAA ATTTGCGATG ATTGCCGCTT TTGTGTCCGC 

1101 CCCTGTGTTT GCCTGGCTGA ATTACCGTTT GGTCAAAGGT GATGAAAAAC 

1151 ACAAACTCAC ATCAGGTATG AATGCCCTTG CATTGGCAGG CTTGATTTAT 

1201 CTGACCGGTT TTACCGTTTT GTTCTTATTG AATTTGGCGG GAATGTT CAA 

1251 ATGA 

This encodes a protein having amino acid sequence <SEQ ID 482>: 

1 MSEQHISTWK SKINALGPGI MMASAAVGGS HLIASTQAG A LYGWQIALII 

51 ILTNLFKYPF FRFSAHYTLD TGKSLIEGYA EKSRVYLW VF LILCILSATI 

101 NAGAV AIVTA AIVKMAIPSL MFD AGTVAAL IMASCLIILV SGRYRALDRV 

151 SK IIIVTLSI ATLAAAGIAM SRGMQMQSDF IEPTPW TLAG LGFLIALMGW 

201 MPA PIEISAI NSLWVTEKQR INPSEYRDGI FDFNVGY IAS AVLALVFLAL 

251 GAFV QYGNGE AVQMAGGKY I GQLINMYAVT IGGWSRPL VA FIAFACMYGT 

301 TITW DGYAR AIAEPVRLLR GKDKTGNAE F FAWNIWVAGS GLAVIF WFDG 

351 VMAN LLKFAM IAAFVSAPVF A WLNYRLVKG DEKHKLTSGM N ALALAGLIY 

401 LTGFTVLFL L NLAGMFK* 

ORF 53a shows 100.0% identity in 417 aa overlap with ORF53-1: 

10 20 30 40 50 60 

orf 53a . pep MSEQHISTWKSKINALGPGIMMASAAVGGSHLIASTQAGALYGWQIALII ILTNLFKYPF 
I | | | | I I I M M I I I I I M I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I 
orf 53-1 MSEQHISTWKSKINALGPGIMMASAAVGGSHLIASTQAGALYGWQIALI I ILTNLFKYPF 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 53a pep FRFSAHYTLDTGKSLIEGYAEKSRVYLWVFLILCILSATINAGAVAIVTAAIVKMAIPSL 
I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I i I II I I I I I I I I II I I I I I I I I I I I I I I I I I 
orf 53-1 FRFSAHYTLDTGKSLIEGYAEKSRVYLWVFLILCILSATINAGAVAIVTAAIVKMAIPSL 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 53a. pep MFDAGTVAALIMASCLIILVSGRYRALDRVSKIIIVTLSIATLAAAGIAMSRGMQMQSDF 
M || I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I 1 I i I I I I I I I I I 
orf 53-1 MFDAGTVAALIMASCLIILVSGRYRALDRVSKIIIVTLSIATLAAAGIAMSRGMQMQSDF 

130 140 150 160 170 180 



190 200 210 220 230 240 

or f 53a . pep IEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDGIFDFNVGYIAS 

I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I M II I I I II I I I I M II II M M II II I 
orf 53-1 IEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDGIFDFNVGYIAS 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 53a . pep AVLALVFLALGAFVQYGNGEAVQMAGGKYIGQLINMYAVTIGGWSRPLVAFIAFACMYGT 

II I I I I I I II I I I I I I I I II II I I I I I I I 1 I I I I I I I I I I I I I M II M M I I I I I I I I I 
or f 5 3 - 1 AVLALVFLALGAFVQYGNGEAVQMAGGKYIGQLINMYAVTI GGWSRPLVAFIAFACMYGT 

250 260 270 280 290 300 
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310 320 330 340 350 360 

TITWDGYARAIAEPVRLLRGKDKTGNAEFFAWNIWVAGSGLAVIFWFDGVMANLLKFAM 

I I I I I I I I | I I I I I M ! I I I I I M I I M I I I I I ! M I I M I! M M I M I I M I I I I ! I I 

TITVVDGYARAIAEPVRLLRGKDKTGNAEFFAWNIWVAGSGLAVIFWFDGVMANLLKFAM 
310 320 330 340 350 360 

370 380 390 400 410 

IAAFVSAPVFAWLNYRLVKGDEKHKLTSGMNALALAGLIYLTGFTVLFLLNLAGMFKX 

| | | | | | | M i | | I I 1 I! I ! I I I I I M I I I I M I I I I I 1 I 1 I I M M I I I I I I I I I I I ! 
IAAFVSAPVFAWLNYRLVKGDEKHKLTSGMNALALAGLIYLTGFTVLFLLNLAGMFKX 
370 380 390 400 410 



Homology with a predicted ORF from N '.gonorrhoeae 
15 ORF53 shows 92.1% identity over a 139aa overlap with a predicted ORF (ORF53ng) from N. 
gonorrhoeae: 

orf53 pep VSGRYRALDRVSKIIIVTLSIATLAAAGIA 3 0 

MUM I I 

orf53ng AAIVKMAIPSLMFDAGTVAALIMASCLIILVSGRYRALDRVSKIIIVTLSIATLAAAGIA 91 

20 

orf53 pep MSRGMQMQSDFIEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDG 90 

II M I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I II I 

orf 53ng MSRGMQMQPDFIEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDG 151 

25 orf 53 pep IFEFNVGYIASAVLALVFLALGXVAPNGNGXTVQMAGGKYNGQLINMYA 13 9 

II : I I I I I I I I I I I I I I I I I I I : III : I I I : I I I I I I I I I I I I 
orf 53ng IFDFNVGYIASAVLALVFLALGAFVQYGNGEAVQMGGGKYIGQLINMYAVTIGGGSRPLV 211 

An ORF53ng nucleotide sequence <SEQ ID 483> was predicted to encode a protein having amino 
30 acid sequence <SEQ ID 484>: 

1 MPKKSCVYLW VFLILCIASA TINAGAVAIV TAAIVKMAIP SLMFDAGTVA 

51 ALIMASCLII LVSGRYRALD RVSK II IVTL SIATLAAAGI AM S RGMQMQP 

101 DFIEPTPW TL AGLGFLIALM GWMPA PIEIS AINSLWVTEK QRINPSEYRD 

151 GIFDFNVGY I ASAVLALVFL ALGAFV QYGN GEAVQMGGGK YIGQLINMYA 

35 2 01 VTIGGGSRPL VAFIAFACMY GAASTW DGY ARAIAEPVRL LRGKDKTARP 

251 IVLLEKLGGR HRFGRDFLV* 

Further analysis revealed further partial DNA gonococcal sequence <SEQ ID 485>: 



1 


. . aagaAAAGCT 


GCGTTTATTT 


GTGGGTTTTT 


TTGATTTTGT 


GTATCGCCTC 


51 


CGCCACGATT 


AACGCGGGCG 


CGGTCGCCAT 


TGTAACCGCC 


GCCATCGTCA 


101 


AAATGGCGAT 


TCCCTCGCTG 


ATGTTTGATG 


CCGGCACGGT 


TGCCGCCTTG 


151 


ATTATGGCAT 


CCTGCCTGAT 


TATTTTGGTG 


AGCGGACGTT 


ACCGCGCTTT 


201 


GGATCGTGTT 


TCCAAAATCA 


TCATTGTTAC 


TTTGAGCATC 


GCCACGCTTG 


251 


CCGCCGCCGG 


CATCGCTATG 


TCGCGCGGTA 


TGCAGATGCA 


GCCCGATTTT 


301 


ATCGAGCCGA 


CACCGTGGAC 


GCTTGCCGGT 


TTGGGCTTCC 


TGATCGCGCT 


351 


GATGGGCTGG 


ATGCCCGCGC 


CGATCGAAAT 


TTCCGCCATC 


AATTCTTTGT 


401 


GGGTAACCGA 


AAAACAACGC 


ATCAATCCTT 


CTGAATACCG 


CGACGGGATT 


451 


TTCGATTTCA 


ACGTCGGTTA 


TATCGCcagT 


GCGGTTTTGG 


CTTTGGTTTT 


501 


CCTTGCACTG 


GGCGCGTTTG 


TGCAATACGG 


CAACGGCGAA 


GCAGTGCAGA 


551 


TGGCGGGCGG 


CAAAT AT AT C 


GGGCAATTGA 


TTAATATGTA 


TGCCGTAACC 


601 


ATCGGCGGCT 


GGTCTCGTCC 


GCTGGTGGCG 


TTTATCGCGT 


TTGCCTGTAT 


651 


GTACGGCACG 


ACGATTACCG 


TTGTGGACGG 


TT AT GCGCGT 


GCCATTGCCG 


701 


AACCCGTGCG 


CCTGCTGCGC 


GGCAGGGATA 


AAACCGGCAA 


CGCCGAGTTG 


751 


TTtgccTGGA ATATTTGGGT 


GGCGGGCAGC 


GGTTTGGCGG 


TGATTTTCTG 


801 


GTTTGACggc 


gcaaTGGCgG 


AACtgcTCAA 


ATTTGCGATG 


ATtgccgcCT 


851 


TTGTGTCCGC 


CCCTGTGTTC 


GCCTGGCTCA 


ACTACCGCCT 


CGTCAAAGGG 


901 


GACAAACGCC 


ACAGGCTTAC 


CGCCGGTATG 


AACGCCCTTG 


CCATTGTCGG 


951 


CCTGCTCTAC 


CTGGCCGGGT 


TTGCCGTTTT 


GTTCCTGTTG 


AACCTTACCG 


1001 


GACTTTTGGC 


ATAG 









This corresponds to the amino acid sequence <SEQ ID 486; ORF53ng-l>: 
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1 KKSCVYLWVF LILCIASATI NAGAVAIVTA AIVKMAIPSL MFDAGTVAAL 

51 IMASCLIILV SGRYRALDRV SK IIIVTLSI ATLAAAGIAM SRGMQMQPDF 

101 TEPTPW TLAG LGFLIALMGW MPA PIEISAI NSLWVTEKQR INPSEYRDGI 

151 FDFNVGY IAS AVLALVFLAL GAFV QYGNGE AVQMAGGKY I GQLINMYAVT 

201 IGGWSRPL VA FIAFACMYGT TITW DGYAR AIAEPVRLLR GRDKTGNAEL 

251 FAWNIWVAGS GLAVIFW FDG AMAE LLKFAM IAAFVSAPVF AW LNYRLVKG 

3 01 DKRHRLTAGM N ALAIVGLLY LAGFAVLFL L NLTGLLA* 

ORF53ng-l and ORF53-1 show 94.0% identity in 336 aa overlap: 



25 



rf53-l.pep I 



orf 53-1. pep 



orf 53-1 .pep 



orf 53-1 . pep 



60 70 80 90 100 110 

LTNLFKYPFFRFSAHYTLDTGKSLIEGYAEKSRVYLWVFLILCILSATINAGAVAIVTA 
: I I I I I I I ] I I I I I I I I I I I I I I I I I I I 
KKSCVYLWVFLILCIASATINAGAVAIVTA 



20 



30 



120 130 140 150 160 170 

AIVKMAIPSLMFDAGTVAALIMASCLIILVSGRYRALDRVSKIIIVTLSIATLAAAGIAM 
| | | | | | | | I I I I I I I I I II I 1 I 1 I I I I I i I I I II I II II II II M ! I I I I I I I I I I M I I 
AIVKMAIPSLMFDAGTVAALIMASCLIILVSGRYRALDRVSKIIIVTLSIATLAAAGIAM 



50 



60 



70 



90 



40 



180 190 200 210 220 230 

SRGMQMQSDFIEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDGI 
I M I I I I I I I I 11 I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I M M I I I M I I I I I I 
SRGMQMQPDFIEPTPWTLAGLGFLIALMGWMPAPIEISAINSLWVTEKQRINPSEYRDGI 

100 110 120 130 140 150 

240 250 260 270 280 290 

FDFNVGYIASAVLALVFLALGAFVQYGNGEAVQMAGGKYIGQLINMYAVTIGGWSRPLVA 
I I I I I I I 1 I 1 I I II I I I M I I I I I II I I I I II I I I I I I I I I I I I I I I I I I II II I I I I I I 
FDFNVGYIASAVLALVFLALGAFVQYGNGEAVQMAGGKYI GQLINMYAVT IGGWSRPLVA 

160 170 180 190 200 210 

300 310 320 330 340 350 

FIAFACMYGTTITVVDGYARAIAEPVRLLRGKDKTGNAEFFAWNIWVAGSGLAVIFWFDG 
I I I I I I I I I I I 1 M I I I I I I I I I I I I I I I I I : 1 I I I I I I : I I I I I I I I I I I I I I I I I I M 
FIAFACMYGTTITWDGYARAIAEPVRLLRGRDKTGNAELFAWNIWVAGSGLAVIFWFDG 

220 230 240 250 260 270 

360 370 380 390 400 410 

VMANLLKFAMIAAFVSAPVFAWLNYRLVKGDEKHKLTSGMNALALAGLIYLTGFTVLFLL 



rf53-l.pep NLAGMFKX 



50 Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 58 

55 The following partial DNA sequence was identified in N. meningitidis <SEQ ID 48 7>: 

1 . . TTGCGGGAAA CGGCATATGT TTTGGATAGT TTTGATCGTT ATTTTGTTGT 
51 TGCGCTTGCC GGCTTGTTTT TTGTCCGCGC ACAAT CCGAA CGCGAGTGGA 

101 TGCGCGAGGT TTCTGCGTGG CAGGAAAAGA AAGGGGAAAA ACAGGCGGAG 
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CTGCCTGAAA TCAAAGACGG TATGCCCGAT TTTCCCGAAC TTGCCCTGAT 
GCTTTTCCAC GCCGTCAAAA CGGCAGTGTA TTGGCTGTTT GTCGGTGTCG 
TCCGTTTCTG CCGAAACTAT CTGGCGCACG AAT CCGAACC GGACAGGCCC 
GTTCCGCCT. . 

This corresponds to the amino acid sequence <SEQ ID 488; ORF58>: 

1 . . LRETAYVLDS FDRYFWALA GLFFVRAQSE RE WMRE V SAW QEKKGEKQAE 
51 LPEIKDGMPD FPELALML FH AVKTAVYWLF VGW RFCRNY LAHESEPDRP 
101 VPP. . 

Further work revealed the complete nucleotide sequence <SEQ ID 489>: 

1 ATGTTTTGGA TAGTTTTGAT CGTTATTTTG TTGCTTGCGC TTGCCGGCTT 

51 GTTTTTTGTC CGCGCACAAT CCGAACGCGA GTGGATGCGC GAGGTTTCTG 

101 CGTGGCAGGA AAAGAAAGGG GAAAAACAGG CGGAGCTGCC TGAAATCAAA 

151 GACGGTATGC CCGATTTTCC CGAACTTGCC CTGATGCTTT TCCATGCCGT 

201 CAAAACGGCA GTGTATTGGC TGTTTGTCGG TGTCGTCCGT TTCTGCCGAA 

251 ACTATCTGGC GCACGAATCC GAACC GGACA GGCCCGTTCC GCCTGCTTCT 

301 GCAAACCGTG CGGATGTTCC GACCGCATCC GACGGAT AT T CAGACAGTGG 

351 AAACGGGACG GAAGAAGCGG AAACGGAAGA AGCAGAAGCT GCGGAGGAAG 

4 01 AGGCTGCCGA TACGGAAGAC ATTGCAACTG CCGTAATCGA CAACCGCCGC 

4 51 ATCCCATTCG ACCGGAGTAT TGCTGAAGGG TTGATGCCGT CTGAAAGCGA 

501 AATTTCGCCC GTCCGTCCGG TTTTTAAAGA AATCACTTTG GAAGAAGCAA 

551 CGCGTGCTTT AAACAGCGCG GCTTTAAGGG AAACGAAAAA ACGCTATATC 

601 GATGCATTTG AGAAAAACGA AACAGCGGTC CCCAAAGTCC GCGTGTCCGA 

651 TACCCCGATG GAAGGGCTGC AGATTATCGG TTTGGACGAC CCTGTGCTTC 

701 AACGCACGTA TTCCCATATG TTCGATGCGG ACAAAGAAGC GTTTTCCGAG 

7 51 TCTGCGGATT ACGGATTTGA GCCGTATTTT GAGAAGCAGC ATCCGTCTGC 

8 01 CTTTTCTGCA GTCAAAGCCG AAAATGCACG GAATGCGCCG TTCCACCGTC 
851 ATGCAGGGCA GGGGAAAGGG CAGGCGGAGG CAAAATCCCC GGATGTTTCC 
901 CAAGGGCAGT CCGTTTCAGA CGGCACGGCC GTCCGCGATG CCCGCCGCCG 
951 CGTTTCCGTC AATTTGAAAG AACCGAACAA GGCAACGGTT TCTGCGGAGG 

1001 CGCGAATTTC TCGCCTGATT CCGGAAAGTC AGACGGTTGT CGGGAAACGG 

1051 GATGTCGAAA TGCCGTCTGA AACCGAAAAT GTTTTCACGG AAACCGTTTC 

1101 GTCTGTGGGA TACGGCGGTC CGGTTTATGA TGAAACTGCC GATATCCATA 

1151 TTGAAGAACC TGCCGCGCCC GATGCTTGGG TGGTCGAACC ACCCGAAGTG 

1201 CCGAAAGTTC CCATGACCGC AATCGATATT CAGCCGCCGC CTCCCGTATC 

1251 GGAAAT CT AC AAC CGTACCT ATGAACCGCC GTCAGGATTC GAGCAGGTGC 

1301 AACGCAGCCG CATTGCCGAG ACCGACCATC TTGCCGATGA TGTTTTGAAT 

1351 GGAGGTTGGC AGGAGGAAAC CGCCGCTATT GCGGATGACG GCAGTGAAGG 

1401 TGCGGCAGAG CGGTCAAGCG GGCAATATCT GTCGGAAACC GAAGCGTT CG 

1451 GGCATGACAG TCAGGCGGTT TGTCCGTTTG AAAATGTGCC GTCTGAACGC 

1501 CCGTCCTGCC GGGTATCGGA TACGGAAGCG GAT G AAGGGG CGTTCCCATC 

1551 TGAAGAAACC GGTGCGGTAT CCGAACACCT GCCGACAACC GACCTGCTTC 

1601 TGCCTCCGCT GTTCAATCCC GAGGCGACGC AAACCGAAGA AGAACTGTTG 

1651 GAAAACAGCA TCACCATCGA AGAAAAATTG GCGGAGTTCA AAGTCAAGGT 

1701 CAAGGTTGTC GATTCTTATT CCGGCCCCGT AATTACGCGT TATGAAATCG 

1751 AACCCGATGT CGGCGTGCGC GGCAATTCCG TTCTGAATCT GGAAAAAGAT 

1801 TTGGCGCGTT CGCTCGGCGT GGCTTCCATC CGCGTTGTCG AAACCATCCC 

1851 CGGCAAAACC TGCATGGGTT TGGAACTTCC GAAC CCGAAA CGCCAAATGA 

1901 TACGCCTGAG CGAAATCTTC AATTCGCCCG AGTTTGCCGA ATCCAAATCC 

1951 AAGCTGACGC TCGCGCTCGG TCAGGACATC ACCGGACAGC CCGTCGTAAC 

2001 CGACTTGGGA AAAGCACCGC ATTTGTTGGT TGCCGGCACG ACCGGTTCGG 

2051 GCAAATCGGT GGGTGTCAAC GCGATGATTC TGTCTATGCT TTTCAAAGCC 

2101 GCGCCGGAAG ACGTGCGTAT GATTATGATC GATCCGAAAA TGCTGGAATT 

2151 GAGCATTTAC GAAGGCATCC CGCACCTGCT CGCCCCTGTC GTTACCGATA 

22 01 TGAAGCTGGC GGCAAACGCG CTGAACTGGT GTGTTAACGA AATGGAAAAA 
2251 CGCTACCGCC TGATGAGCTT TATGGGCGTG CGTAATCTTG CGGGCTT CAA 

23 01 TCAAAAAATC GCCGAAGCCG CAGCAAGGGG AGAAAAAATC GGCAATCCGT 
2351 TCAGCCTCAC GCCCGACGAT CCCGAACCTT TGGAAAAACT GCCGTTTATC 

24 01 GTGGTCGTGG TCGATGAGTT TGCCGACCTG ATGATGACGG CAGGCAAGAA 
2 4 51 AAT CGAAGAA CTGATTGCCC GCCTCGCCCA AAAAGCCCGC GCGGCAGGCA 
2501 TCCATTTGAT TCTTGCCACA CAACGCCCCA GCGTCGATGT CATCACGGGT 
2551 CTGATTAAGG CGAACATCCC GACGCGTATC GCGTTCCAAG TGTCCAGCAA 
2 601 AAT CGACAGC CGCACGATTC TCGACCAAAT GGGCGCGGAA AACCTGCTCG 
2 651 GTCAGGGCGA TATGCTGTTC CTGCTGCCGG GTACTGCCTA TCCGCAGCGC 
27 01 GTTCACGGCG CGTTTGCCTC GGATGAAGAG GTGCACCGCG TGGTCGAATA 

2 7 51 TTTGAAACAG TTTGGCGAAC CGGACTATGT TGACGATATT TTGAGCGGCG 

2 801 GCGGCAGCGA AGAGCTGCCC GGCATCGGGC GCAGCGGCGA CGACGAAACC 



151 
201 
251 
301 
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2851 GATCCGATGT ACGACGAGGC CGTATCCGTT GTCCTGAAAA CGCGCAAAGC 

2901 CAGCATTTCG GGCGTACAGC GCGCCTTGCG TATCGGCTAC AACCGCGCCG 

2951 CGCGTCTGAT TGACCAGATG GAGGCGGAAG GCATTGTGTC CGCACCGGAA 

3001 CACAACGGCA ACCGTACGAT TCTCGTCCCC TTGGACAATG CTTGA 

This corresponds to the amino acid sequence <SEQ ID 490; ORF58-l>: 

1 MFWIVLIVIL LLALAGLFFV RAQS EREWMR EVSAWQEKKG EKQAELPEIK 

51 DGMPDFPELA LM LFHAVKTA VYWLFVGW R FCRNYLAHES EPDRPVPPAS 

101 ANRADVPTAS DGYSDSGNGT EEAETEEAEA AEEEAADTED IATAVIDNRR 

151 IPFDRSIAEG LMPSESEISP VRPVFKEITL EEATRALNSA ALRETKKRYI 

201 DAFEKNETAV PKVRVSDTPM EGLQIIGLDD PVLQRTYSHM FDADKEAFSE 

251 SADYGFEPYF EKQHPSAFSA VKAENARNAP FHRHAGQGKG QAEAKSPDVS 

301 QGQSVSDGTA VRDARRRVSV NLKEPNKATV SAEARISRLI PESQTWGKR 

351 DVEMPSETEN VFTETVSSVG YGGPVYDETA DIHIEEPAAP DAWWEPPEV 

401 PKVPMTAIDI QPPPPVSEIY NRTYEPPSGF EQVQRSRIAE TDHLADDVLN 

4 51 GGWQEETAAI ADDGSEGAAE RSSGQYLSET EAFGHDSQAV CPFENVPSER 

501 PSCRVSDTEA DEGAFPSEET GAVSEHLPTT DLLLPPLFNP EATQTEEELL 

551 ENSITIEEKL AEFKVKVKW DSYSGPVITR YEIEPDVGVR GNSVLNLEKD 

601 LARSLGVASI RWETIPGKT CMGLELPNPK RQMIRLSEIF NSPEFAESKS 

651 KLTLALGQDI TGQPWTDLG KAPHLLVAGT TGSGKSVGVN AMILSMLFKA 

701 APEDVRMIMI DPKMLELSIY EGIPHLLAPV VTDMKLAANA LNWCVNEMEK 

751 RYRLMSFMGV RNLAGFNQKI AE AAARGEK I GNPFSLTPDD PEPLEKLPFI 

801 WWDEFADL MMT AGKKIEE L I ARLAQKAR AAGIHLILAT QRPSVDVITG 

851 LIKANIPTRI AFQVSSKIDS RTILDQMGAE NLLGQGDMLF LLPGTAYPQR 

901 VHGAFASDEE VHRVVEYLKQ FGEPDYVDDI LSGGGSEELP GIGRSGDDET 

951 DPMYDEAVSV VLKTRKASIS GVQRALRIGY NRAARLI DQM EAEGIVSAPE 

10 01 HNGNRTILVP LDNA* 

Computer analysis of this amino acid sequence predicts the indicated transmembrane region, and 
also gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF58 shows 96.6% identity over a 89aa overlap with an ORF (ORF58a) from strain A of TV. 
meningitidis: 

10 20 30 40 50 60 

orf 58 . pep LRETAYVLDSFDRYFW ALAGLFFVRAQS EREWMREVSAWQEKKGEKQAELPEIKDGMPD 

o r f 5 8 a MFWIVLIVILLLALAGLFFVRAQS EREWMREVSAWQEKKGEKQAELPEIKDGMPD 
10 20 30 40 50 



70 80 90 100 

FPELALM LFHAVKTAVYWLFVGW RFCRNYLAHESEPDRPVPP 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

FPELALM LFHAVKTAVYWLFVGW RFCRNYLAHESEPDRPVPPASANRADVPTASDGYSD 
60 70 80 90 100 110 



The complete length ORF58a nucleotide sequence <SEQ ID 49 1> is: 



1 ATGTTTTGGA TAGTTTTGAT CGTTATTTTG TTGCTTGCGC TTGCCGGCTT 

51 GTTTTTTGTC CGCGCACAAT CCGAACGCGA GTGGATGCGC GAGGTTTCTG 

101 CGTGGCAGGA AAAGAAAGGG GAAAAACAGG CGGAGCTGCC T GAAAT C AAA 

151 GACGGTATGC CCGATTTTCC CGAACTTGCC CTGATGCTTT TCCATGCCGT 

201 CAAAACGGCA GTGTATTGGC TGTTTGTCGG TGTCGTCCGT TTCTGCCGAA 

251 ACTATCTGGC GCACGAATCC GAACCGGACA GGCCCGTTCC GCCTGCTTCT 

301 GCAAAT CGTG CGGATGTTCC GACCGCATCC GACGGATATT CAGACAGTGG 

351 AAACGGGACG GAAGAAGCGG AAACGGAAGA AGCAGAAGCT GCGGAGGAAG 

4 01 AGGCTGCCGA TACGGAAGAC ATTGCAACTG CCGTAATCGA CAACCGCCGC 

4 51 ATCCCATTCG ACCGGAGTAT TGCTGAAGGG TTGATGCCGT CTGAAAGCGA 

501 AATTTCGCCC GTCCGTCCGG TTTTTAAGGA AATCACTTTG GAAGAAGCAA 

551 CGCGTGCTTT AAACAGCGCG GCTTTAAGGG AAACGAAAAA ACGCTATATC 

601 GATGCATTTG AGAAAAACGA AACAGCGGTC CCCAAAGTCC GCGTGTCCGA 

651 TACCCCGATG GAAGGGCTGC AGATTATCGG TTTGGACGAC CCTGTGCTTC 

7 01 AACGCACGTA TTCCCGTATG TTCGATGCGG ACAAAGAAGC GTTTTCCGAG 
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751 TCTGCGGATT ACGGATTTGA GCCGTATTTT GAGAAGCAGC ATCCGTCTGC 

801 CTTTTCTGCA GTCAAAGCCG AAAATGCACG GAATGCGCCG TTCCGCCGTC 

851 ATGCAGGGCA GGGNAAAGGG CAGGCGGAGG CNAAATCCCC GGATGTTTCC 

901 CAAGGGCAGT CCGTTTCAGA CGGCACAGCC GTCCGCGATG CCNGCCGCCG 

951 CGTTTCCGTC AATTTGAAAG AACCGAACAA GGCAACGGTT TCTGCGGAGG 

1001 CGCGGATTTC GCGCCTGATT CCGGAAAGTC GGACGGTTGT CGGGAAACGG 

1051 GATGTCGAAA TGCCGTCTGA AACCGAAAAT GTTTTCACGG AAANTGTTTC 

1101 GTCTGTGGGA TACGGCGNTC CGGTTTATGA TGAAACTGCC GATATCCATA 

1151 TTGAAGAACC TGCCGCGCCC GATGCTTGGG TGGTCGAACC ACCCGAAGTG 

1201 CCGAAAGTTC CCATGCCCGC AATN GAT AT T CCGCCGCCGC CTCCCGTATC 

1251 GGAAATCTAC AACCGTACCT ATGAACCGCC GGCAGGATTC GAGCAGGTGC 

1301 AACGCAGCCG CATTGCCGAA ACCGATCATC TTGCCGATGA TGTTTTGAAT 

1351 GGAGGTTGGC AGGAGGAAAC CGCCGCTATT GCGAATGACG GCAGTGAGGG 

1401 TGTGGCAGAG CGGT CAAGCG GGCAATATTT GTCGGAAACC GAAGCGTTCG 

1451 GGCAT GACAG TCAGGCGGTT TGTCCGTTTG AAAATGTGCC GTCTGAACGC 

1501 CCGTCCCGCC GGGCATNGGA TACGGAAGCG GAT GAAGGGG CGTTCCAATC 

1551 TGAAGAAACC GGTGCGGTAT CCGAACACCT GCCGACAACC GACCTGCTTC 

1601 TGCCGCCGCT GTTCAATCCC GGGGCGACGC AAACCGAAGA AGANCTGTTG 

1651 GANAACAGCA TCACCATCGA AGAAAAATNG GCGGAGTTCA AAGTCAAGGT 

17 01 CAAGGTTGTC GATTCTTATT CCGGCCCCGT GATTACGCGT TATGAAATCG 

17 51 AACCCGATGT CGGCGTGCGC GGCAATTCCG TTCTAAATCT GGAAAAAGAN 

18 01 TTGGCGCGTT CGCTCGGCGT GGCTTCCATC CGCGTTGTCG AAACCATCCT 
1851 CGGCAAAACC TGTATGGGTT TGGAACTTCC GAACCCGAAA CGCCAAATGA 
1901 TACGCCTGAG CGAAATCTTC AATTCGCCCG AGTTTGCCGA ATCCAAATCC 
1951 AAGCTGACGC TCGCGCTCGG TCAGGACATC ACCGGACAGC CCGTCGTAAC 
2001 CGACTTGGGC AAAGCACCGC ATTTGTTGGT TGCCGGCACG ACCGGTTCGG 
2051 GCAAATCGGT GGGTGTCAAC GCGATGATTC TGTCTATGCT TTTCAAAGCC 
2101 GCGCCGGAAG ACGTGCGTAT GAT TAT GAT C GAT CCGAAAA TGCTGGAATT 
2151 GAGCATTTAC GAAGGCATCC CGCACCTGCT CGCCCCTGTC GTTACCGATA 
22 01 TGAAGCTGGC GGCAAACGCG CTGAACTGGT GTGTTAACGA AATGGAAAAA 
22 51 CGCTACCGCC TGATGAGCTT TATGGGCGTG CGCAATCTTG CGGGTNTCAA 
2301 TCAAAAAATC GCCGAAGCCG CAGCAAGGGG GGAGAAAATC GGCAACCCGT 
2351 TCAGCCTCAC GCCCGACAAT CCCGAACCTT TGGANAAATT GCCGTTTATC 
2401 GTGGTCGTGG TTGATGAGTT TGCCGACCTG ATGATGACGG CAGGCAAGAA 
24 51 AATCGAAGAA CTGATTGCCC GCCTCGCCCA AAAAGCCCGC GCGGCAGGCA 
2501 TCCATCTTAT CCTTGCCACA CAACGCCCCA GTGTCGATGT CATCACGGGT 
2551 CTGATTAAGG CGAACATCCC GACGCGTATC GCGTTCCAAG TGTCCAGCAA 
2601 AATCGACAGC CGCACGATTC TTGACCAAAT GGGTGCGGAA AACCTGCTCG 
2651 GGCAGGGCGA TATGCTGTTC CTGCCGCCGG GTACGGCCTA TCCGCAGCGC 

27 01 GTTCACGGCG CGTTTGCCTC GGATGAAGAG GTGCACCGCG TGGTCGAATA 
2751 TCTGAAACAG TTTGGCGAAC CGGACTATGT TGACGATATN TTGAGCGGCG 

28 01 GTATGTCCGA CGATTTGCTG GGAATCAGCC GGAGCGGCGA CGGCGAAACC 
2851 GATCCGATGT ACGACGAGGC CGTGTCNGTT GTTTTGAAAA CGCGCAAAGC 
2901 CAGCATTTCT GGCGTGCAGC GCGCATTGCG TATCGGCTAT AATCGCGCCG 
2 951 CGCGTCTGAT TGACCAGATG GAGGC GGAAG GCATTGTGTC CGCACCGGAA 
3001 CACAACGGCA ACCGTACGAT TCTCGTCCCC TTNGACAATG CTTGA 

This encodes a protein having amino acid sequence <SEQ ID 492>: 

1 MFWIVLIVIL LLALAGLFFV RAQS EREWMR EVSAWQEKKG EKQAELPEIK 

51 DGMPDFPELA LM LFHAVKTA VYWLFVGW R FCRNYLAHES EPDRPVPPAS 

101 ANRADVPTAS DGYSDSGNGT EEAETEEAEA AEEEAADTED IATAVIDNRR 

151 IPFDRSIAEG LMPSESEISP VRPVFKEITL EEATRALNSA ALRETKKRYI 

201 DAFEKNETAV PKVRVSDTPM EGLQIIGLDD PVLQRTYSRM FDADKEAFSE 

251 SADYGFEPYF EKQHPSAFSA VKAENARNAP FRRHAGQGKG QAEAKSPDVS 

301 QGQSVSDGTA VRDAXRRVSV NLKEPNKATV SAEARI3RLI PESRTWGKR 

351 DVEMPSETEN VFTEXVSSVG YGXPVYDETA DIHIEEPAAP wDAWWEPPEV 

401 PKVPMPAXDI PPPPPVSEIY NRTYEPPAGF EQVQRSRIAE TDHLADDVLN 

4 51 GGWQEETAAI ANDGSEGVAE RSSGQYLSET EAFGHDSQAV CPFENVPSER 

501 PSRRAXDTEA DEGAFQSEET GAVSEHLPTT DLLLPPLFNP GATQTEEXLL 

551 XNSITIEEKX AEFKVKVKVV DSYSGPVITR YEIEPDVGVR GNSVLNLEKX 

601 LARSLGVASI RVVETILGKT CMGLELPNPK RQMIRLSEIF NSPEFAESKS 

651 KLTLALGQD I TGQPWTDLG KAPHLLVAGT TGSGKSVGVN AMILSMLFKA 

7 01 APEDVRMIMI DPKMLELSIY EGIPHLLAPV VTDMKLAANA LNWCVNEMEK 

7 51 RYRLMS FMGV RNLAGXNQKI AEAAARGEKI GNPFSLTPDN PEPLXKLPFI 

801 VWVDE FADL MMT AGKKIEE LIARLAQKAR AAGIHLILAT QRPSVDVITG 

851 LIKANIPTRI AFQVSSKIDS RTILDQMGAE NLLGQGDMLF LPPGTAYPQR 

901 VHGAFASDEE VHRVVEYLKQ FGEPDYVDDX LSGGMSDDLL GISRSGDGET 

951 DPMYDEAVSV VLKTRKASIS GVQRALRIGY NRAARLIDQM EAEGIVSAPE 

1001 HNGNRTILVP XDNA* 
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ORF58a and ORF58-1 show 96.6% identity in 1014 aa overlap: 

10 20 30 40 50 60 

orf58a pep MFWIVLIVILLLALAGLFFVRAQSEREWMREVSAWQEKKGEKQAELPEIKDGMPDFPELA 

I | | | | | M I i I II I I I I I M I I I M I I I I I I M M I I I I I I I I I I I I I I I I I II II M I I 

orf 58-1 MFWIVLIVILLLALAGLFFVRAQSEREWMREVSAWQEKKGEKQAELPEIKDGMPDFPELA 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 58a pep LMLFHAVKTAVYWLFVGWRFCRNYLAHESEPDRPVPPASANRADVPTASDGYSDSGNGT 

| | I I I I I I I I I I I II I I I II I I I I I I I I I II I I I I Mill 

orf 58-1 LMLFHAVKTAVYWLFVGWRFCRNYLAHESEPDRPVPPASANRADVPTAS DGYSDSGNGT 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 58a pep EEAETEEAEAAEEEAADTEDIATAVIDNRRIPFDRSIAEGLMPSESEISPVRPVFKEITL 
I I I I I I I I I I I I I II I I I I I I! I I I II I I I I I II M I I I I I I I I I I I I I I I I I I I I I I I I 
orf 58-1 EEAETEEAEAAEEEAADTEDIATAVIDNRRIPFDRSIAEGLMPSESEISPVRPVFKEITL 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 58a pep EEATRALNSAALRETKKRYIDAFEKNETAVPKVRVSDTPMEGLQIIGLDDPVLQRTYSRM 
| | | | | I I I I || I I I I I I II I I I II I I I I II I I I I I I I I I I I I M I I I I II I I 1 I II I I : I 
orf 58-1 EEATRALNSAALRETKKRYIDAFEKNETAVPKVRVSDTPMEGLQIIGLDDPVLQRTYSHM 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 58a . pep FDADKEAFSESADYGFEPYFEKQHPSAFSAVKAENARNAPFRRHAGQGKGQAEAKSPDVS 
I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I 
orf 58-1 FDADKEAFSESADYGFEPYFEKQHPSAFSAVKAENARNAPFHRHAGQGKGQAEAKSPDVS 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 58a. pep QGQSVSDGTAVRDAXRRVSVWLKEPNKATVSAEARISRLIPESRTWGKRDVEMPSETEN 
1 I I M I I II I 1 I I I I I II I I I II I I I I I I M I II I I I I I I I I : I II I I I I I I II M I I I 
orf 58-1 QGQSVSDGTAVRDARRRVSVNLKEPNKATVSAEARISRLIPESQTWGKRDVEMPSETEN 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 58a . pep VFTEXVSSVGYGXPVYDETADIHIEEPAAPDAWWEPPEVPKVPMPAXDIPPPPPVSEIY 
I I II : I I II I I I I I I I I I I I I I I I I I 1 M I I I I I I I I I I I I I I I I II I I I I I I I I I 
orf 58-1 VFTETVSSVGYGGPVYDETADIHIEEPAAPDAWWEPPEVPKVPMTAIDIQPPPPVSEIY 

370 380 390 400 410 420 

430 440 450 460 470 480 

or f 58a . pep NRTYEPPAGFEQVQRSRIAETDHLADDVLNGGWQEETAAIANDGSEGVAERSSGQYLSET 



490 500 510 520 530 540 

EAFGHDSQAVCPFENVPSERPSRRAXDTEADEGAFQSEETGAVSEHLPTTDLLLPPLFNP 
I I II I I II I I I I I I I I I II I I I I : I II I I I I I I I II I I I I I I I II I II II M I I I I I 
EAFGHDSQAVCPFENVPSERPSCRVSDTEADEGAFPSEETGAVSEHLPTTDLLLPPLFNP 

490 500 510 520 530 540 

550 560 570 580 590 600 

GATQTEEXLLXNSITIEEKXAEFKVECVKWDSYSGPVITRYEIEPDVGVRGNSVLNLEKX 



610 620 630 640 650 660 

LARSLGVASIRWETILGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDI 
I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 
LARSLGVASIRWETIPGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDI 

610 620 630 640 650 660 

670 680 690 700 710 720 
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nrf58a Deo TG QPWTDLGKAPHLLVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIY 

orfbSa.pep i u , , , , , , , | | | , M | M II I I I I I II I I II M M M II I I I I I 1 I I M I I 

orf58-l TGQPWTDLGKAPHLLVAGTTGSGKSVGWAMILSMLFKAAPEDVRMIMIDPKMLELSIY 



720 



orf58a.pep 



730 740 750 760 770 780 

EGIPHLLAPWTDMKLAANALNWCVNEMEKRYRLMSFMGVRNLAGXNQKIAEAAARGEKI 

| | | | | | | I I I 1 II I II I I I II I I I I I II I I I I I I I I I I I I M I I II I I I I 

EGIPHLLAPWTDMKLAANALNWCVNEMEKRYRLMSFMGVRNLAGFNQKIAEAAARGEKI 
730 740 750 760 770 780 

790 800 810 820 830 840 

GNPFSLTPDNPEPLXKLPFIWWDEFADLMMTAGKKIEELIARLAQKARAAGIHLILAT 

| | | | | | | | | : | I || I I I II I I II I I I I II I I I I I I I I I I i I i I II I II II I I I I I I II I 
GNPFSLTPDDPEPLEKLPFIWWDEFADLMMTAGKKIEELIARLAQKARAAGIHLILAT 
790 800 810 820 830 840 

850 860 870 880 890 900 

QRPSVDVITGLIKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLPPGTAYPQR 

M II I I I I II I I M I I I I I I I I I I I I I I I I I I I I I M I I I I II I I I I I I I I I 1 I I I M I 
QRPSVDVITGLIKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLLPGTAYPQR 
850 860 870 880 890 900 

910 920 930 940 950 960 

VHGAFASDEEVHRVVEYLKQFGEPDYVDDXLSGGMSDDLLGISRSGDGETDPMYDEAVSV 
I I I I I I I I I I 11 I I I I I I I I I I I I I I I I I I I I I I = = I I I : I I I I I I I I I I I I I II I 
orf58-l VHGAFASDEEVHRVVEYLKQFGEPDYVDDILSGGGSEELPGIGRSGDDETDPMYDEAVSV 
910 920 930 940 950 960 

970 980 990 1000 1010 

orf58a pep VLKTRKASISGVQRALRIGYNRAARLIDQMEAEGIVSAPEHNGNRTILVPXDNAX 

! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I MINIM MM 

orf58-l VLKTRKASISGVQRALRIGYNRAARLIDQMEAEGIVSAPEHNGNRTILVPLDNAX 

970 980 990 1000 1010 

Homology with a predicted QRF from N. gonorrhoeae 

ORF58 shows complete identity over a 9aa overlap with a predicted ORF (ORF58ng) from N. 
gonorrhoeae: 

orf58.pep ALMLFHAVKT AVYWLFVGVVRFCRNYLAHESE P DRPVPP 103 

I I II I I I II 

orf58ng SEPDRPVPPASANRADVPTASDGYSDSGNG 30 

The ORF58ng nucleotide sequence <SEQ ID 493> is predicted to encode a protein having partial 



amino acid sequence <SEQ ID 494>: 



101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 



. S£PL>i?PVPPA 
DIATAVIDNR 
AALRETKKRY 
MFDADKEAFS 
GQAEAKSPDV 
IPESRTWGK 
PDAWWEPPE 
ETDHLAADVL 
VCPFEDVPSE 
PEATQTEEEL 
RGNSVLNLEK 
FNSPEFAESK 
NAMILSMLFK 
ALNWCVNEME 
DPEPLEK LPF 
TQRPSVDVIT 
FLPPGTAYPQ 
PGIGRSGDGE 
MEAEGIVSAP 



SANRADVPTA 
RIPFDRSIAE 
IDAFEKNGTA 
ESADYGFEPY 
SQGQSVSDGT 
RDVEMPSETE 
VPEVAVPEID 
NGGWQEETAA 
RPSCRVSDTE 
LENSITIEEK 
DLARSLGVAS 
SKLTLALGQD 
AAPEDVRMIM 
KRYRLMSFMG 
IVVWDE FAD 
GLIKANIPTR 
RVHGAFASDE 
TDPMYDEAVS 
EHNGNRTILV 



SDGYSDSGNG 
GLMQSESKTS 
VPKVRVSDTP 
FEKQHPSAFS 
AVRDARRRVS 
NVFTETVSSV 
ILPPPPVSEI 
IADDGSEGAA 
ADEGAFQSEE 
LAEFKVKVKV 
IRWETIPGK 
ITGQPWTDL 
IDPKMLELSI 
VRNLAGFNQK 
LMMTA GKKIE 
IAFQVSSKID 
EVHRWEYLK 
WLKTRKAS I 
PLDNA* 



TEEAETEAAE 
PVRPVFKEIT 
MEGLQIIGLD 
AVKAENARNA 
VNLKE PNKAT 
GYGGPVYDEA 
YNRTYEPPAG 
ERSSGQYLSE 
TGAVSEHLPT 
VDSYSGPVIT 
TCMGLELPNP 
GKAPHLLVACL 
YEGITHLLAP 
IAEAAARGEK 
ELIARLAQKA 
SRTILDQMGA 
QFGEPDYVDD 
S GVQRALR I G 



AAEEEAADTE 
LEEATRALSS 
DPVLQRTYSR 
PFRRHAGQEK 
VSAEARISRL 
ADIHIEEPAA 
FEQAQRSRIA 
TEAFGHDSQA 
TDLLLPPLFN 
RYEIEPDVGV 
KRQMIRLSEI 
TTGSGKS VGV 
WTDMKLAAN 
IGNPFSLTPD 
RAAGIHLILA 
ENLLGQGDML 
ILSGGGSEEL 
YNRAARLIDQ 
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This partial gonococcal sequence contains a predicted transmembrane region and a predicted 
ATP/GTP-binding site motif A (P-loop; double underlined). Furthermore, it has a domain 
homologous to the FTSK cell division protein of E. coli. Alignment of ORF58ng and FtsK 
(accession number p46889) show a 65 % amino acid identity in 459 overlap: 

IEEKLAEFKVKVKWDSYSGPVITRYEIEPDVGVRGNSVLNLEKDLARSLGVASIRWET 52 6 
+E +LA+F++K W+ GPVITR+E+ GV+ + NL +DLARSL ++RWE 
VEARLADFRIKADVVNYSPGPVITRFELNLAPGVKAARISNLSRDLARSLSTVAVRWEV 927 

IPGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDITGQPWTDLGKAPHL 58 6 
IPGK +GLELPN KRQ + L E+ ++ +F ++ S LT+ LG+DI G+PW DL K PHL 
I PGKP YVGLE L PNKKRQT V YLREVL DN AKFRDN PS PLT WLGKD I AGE P VVADLAKMPHL 987 

LVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIYEGITHLLAPWTDMK 646 
LVAGTTGSGKSVGVNAMILSML+KA PEDVR IMIDPKMLELS+YEGI HLL WTDMK 
LVAGTTGSGKSVGVNAMILSMLYKAQPEDVRFIMIDPKMLELSVYEGIPHLLTEWTDMK 1047 

LAANALNWCVNEMEKRYRLMSFMGVRNLAGFNQKIAEAAARGEKIGNPFSLTPDDPEP — 7 04 

AANAL WCVNEME+RY+LMS +GVRNLAG+N+KIAEA I +P+ D + 

DAANALRWCVNEMERRYKLMSALGVRNLAGYNEKIAEADRMMRPIPDPYWKPGDSMDAQH 1107 

— LEKLPFIWWDEFADLMMTAGKKIEELIARLAQKARAAGIHLILATQRPSVDVITGL 7 62 

L+K P+IW+VDEFADLMMT GKK+EELIARLAQKARAAGIHL+LATQRPSVDVITGL 
PVLKKEPYIVVLVDEFADLMMTVGKKVEELIARLAQKARAAGIHLVLATQRPSVDVITGL 1167 

IKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLPPGTAYPQRVHGAFASDEEV 822 
IKANIPTRIAF VSSKIDSRTILDQ GAE+LLG GDML+ P + P RVHGAF D+EV 
I KAN I PTRI AFTVS SKI DSRTILDQAGAE SLLGMGDMLYSGPNSTLPVRVHGAFVRDQEV 1227 

HRWEYLKQFGEPDYVDDILSGGGSEELPGIGRSGDGETDPMYDEAVSWLKTRKASISG 882 
H VV+ K G P YVD IS SE G G G E DP++D+AV V + RKASISG 
HAVVQDWKARGRPQYVDGITSDSESEGGAG-GFDGAEELDPLFDQAVQFVTEKRKASISG 128 6 

VQRALRIGYNRAARLI DQMEAEGIVSAPEHNGNRT I LVP 921 
VQR RIGYNRAAR+I +QMEA+GI VS HNGNR +L P 
VQRQFRIGYNRAARIIEQMEAQGIVSEQGHNGNREVLAP 1325 

Further work on ORF58ng revealed the complete gonococcal DNA sequence to be <SEQ ID 495>: 

1 ATGTTTTGGA TAGTTTTGAT CGTTATtgtg TTGCTTGCGC TTGCCGGCCT 

51 GTTTTTTGTC CGCGCACAAT CCGAACGCGA GTGGATGCGC GAGGTTTCTG 

101 CGTGGCAGGA AAAGAAAGGG GAAAAACAGG CGGAGCTGCC T G AAAT C AAA 

151 GACGGT AT GC CCGATTTTCC CGAGTTTTCC CTGATGCTTT TCCATGCCGT 

201 CAAAACGGCA GTGTATTGGC TGTTTGTCGG TGTCGTCCGT TTCTGCCGAA 

251 ACTATCTGGC GCACGAATCC GAACCGGACA GGCCCGTTCC GCCTGCTTCT 

301 GCAAAC CGTG CGGATGTTCC GACCGCATCC GACGGGTATT CAGACAGTGG 

351 AAACGGGACG GAAGAAGCGG AAACGGAAGC AGCAGAAGCT GCGGAGGAAG 

4 01 AGGCTGCCgA TACgGAAGAC ATTGCAACTG CCGTAATCGA CAACCGCCGC 

451 ATCCcatTCG ACCGGAGTAT TGCTGAAGGG TTGATGCAGT CTGAAAGCAA 

501 AACTTCGCCC GTCCGTCCGG TTTTTAAGGA AATCACTTTG GAAGAAGCAA 

551 CGCGTGCTTT AAGCAGCGCG GCTTTAAGGG AAACGAAAAA ACGCTATATC 

601 GATGCATTTG AGAAAAACGG AACAGCCGTC CCCAAAGTAC GCGTGTCCGA 

651 TACCCCGATG GAAGGGCTGC AGATTATCGG TTTGGACGAC CCTGTGCTTC 

7 01 AACGCACGTA TTCCCGTATG TTTGATGCGG ACAAAGAAGC GTTTTCCGAG 

7 51 TCTGCGGATT ACGGATTTGA GCCGTATTTT GAGAAGCAGC ATCCGTCTGC 

8 01 CTTTTCTGCA GTCAAAGCCG AAAATGCACG GAATGCGCCG TTCCGCCGTC 
8 51 ATGCAGGGCA GGAGAAAGGG CAGGCGGAGG CAAAATCCCC GGATGTTTCC 
901 CAAGGGCAGT CCGTTTCAGA CGGCACAGCC GTCCGCGATG CCCGCCGCCG 
951 CGTTTCCGTC AATTTGAAAG AACCGAACAA GGCAACGGTT TCTGCGGAGG 

1001 CGCGGATTTC GCGCCTGATT CCGGAAAGTC GGACGGTTGT CGGGAAACGG 

1051 GATGTCGAAA TGCCGTCTGA AACCGAAAAT GTTTTCACGG AAACCGTTTC 

1101 GTCTGTGGGA TACGGCGGTC CGGTTTATGA TGAAGCTGCC GAT AT CCAT A 

1151 TTGAAGAGCC TGCCGCGCCC GATGCTTGGG TGGTCGAACC ACCCGAAGTG 

12 01 CCGGAGGTAG CCGTACCCGA AATCGATATT CTGCCGCCGC CTCCCGTATC 
1251 GGAAAT CT AC AACCGTACCT ATGAGCCGCC GGCAGGATTC GAGCAGGCGC 

13 01 AACGCAGCCG CATTGCCGAA ACCGACCATC TTGCCGCTGA TGTTTTGAAT 



ORF58ng: 


467 


FtsK: 


868 


ORF58ng: 


527 


FtsK: 


928 


ORF58ng: 


587 


FtsK: 


988 


ORF58ng: 


647 


FtsK: 


1048 


ORF58ng: 


705 


FtsK: 


1108 


ORF58ng: 


763 


FtsK: 


1168 


ORF58ng: 


823 


FtsK: 


1228 


ORF58ng: 


883 


FtsK: 


1287 
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1351 GGAGGTTGGC AGGAGGAAAC CGCCGCTATT GCAGATGACG GCAGTGAGGG 

14 01 TGCGGCAGAG CGGTCAAGCG GGCAATATCT GTCGGAAACC GAAGCGTTCG 

14 51 GGCATGACAG TCAGGCGGTT TGTCCGTTTG AAGATGTGCC GTCTGAACGC 

1501 CCGTCCTGCC GGGTATCGGA TACGGAAGCG GAT GAAGGGG CGTTCCAATC 

1551 GGAAGAGACC GGTGCGGTAT CCGAACACCT GCCGACAACC GACCTGCTTC 

1601 TGCCTCCGCT GTTCAATCCC GAGGCGACGC AAACCGAAGA AGAACTGTTG 

1651 GAAAACAGCA TCACCATCGA AG AAAAAT T G GCGGAGTTCA AAGTCAAGGT 

1701 CAAGGTTGTC GATTCTTATT CCGGCCCCGT GATTACGCGT TATGAAATCG 

1751 AACCCGATGT CGGCGTGCGC GGCAATTCCG TTCTGAATTT GGAAAAAGAC 

1801 TTGGCGCGTT CGCTCGGCGT GGCTTCCATC CGCGTTGTCG AAACCATCCC 

1851 CGGCAAAACC TGCATGGGTT TGGAACTTCC GAAC CCGAAA CGCCAAATGA 

1901 TACGCCTGAG CGAAATTTTC AATTCGCCCG AGTTTGCCGA ATCCAAATCC 

1951 AAGCTGACGC TCGCGCTCGG T CAGGAC AT T ACCGGACAGC CCGTCGTAAC 

2001 CGACTTGGGC AAAGCACCGC ATTTGCTGGT TGCCGGCACG ACCGGTTCGG 

2051 GCAAATCGGT GGGTGTCAAC GCGATGATTC TGTCTATGCT TTTCAAAGCC 

2101 GCGCCGGAAG ACGTGCGTAT GAT TAT GAT C GATCCGAAAA TGCTGGAATT 

2151 GAGCATTTAC GAAGGCATCA CGCACCTGCT CGCCCCTGTC GTTACCGATA 

2201 TGAAGCTGGC GGCAAACGCG CTGAACTGGT GTGTTAACGA AATGGAAAAA 

2251 CGCTACCGCC TGATGAGCTT TATGGGCGTG CGCAATCTTG CGGGCTTCAA 

2301 CCAAAAAATC GCCGAAGCCG CAGCAAGGGG AGAAAAAATC GGCAATCCGT 

2351 TCAGCCTCAC GCCCGACGAT CCCGAACCTT TGGAAAAACT GCCGTTTATC 

2401 GTGGTCGTGG TCGATGAGTT TGCCGATTTG ATGATGACGG CAGGCAAGAA 

2451 AATCGAAGAA CTGATTGCGC GCCTCGCCCA AAAAGCCCGC GCGGCAGGCA 

2501 TCCACCTTAT CCTTGCCACA CAACGCCCCA GCGTCGATGT CATCACGGGT 

2551 CTGATTAAGG CGAACATCCC GACGCGTATC GCGTTCCAAG TGTCCAGCAA 

2601 AATCGACAGC CGCACGATTC TCGACCAAAT GGGCGCGGAA AACCTGCTCG 

2651 GTCAGGGCGA TATGCTGTTC CTGCCGCCGG GTACTGCCTA TCCGCAGCGC 

27 01 GTTCACGGCG CGTTTGCCTC GGATGAAGAG GTGCACCGCG TGGTCGAATA 

2751 TCTGAAGCAG TTTGGCGAGC CGGACTATGT TGACGATATT TTGAGCGGCG 

2801 GCGGCAGCGA AGAGCTGCCC GGCATCGGGC GCAGCGGCGA CGGCGAAACC 

2851 GATCCGATGT ACGACGAGGC CGTATCCGTT GTCCTGAAAA CGCGCAAAGC 

2901 CAGCATTTCG GGCGTACAGC GCGCCTTGCG CATCGGCTAC AACCGCGCCG 

2951 CGCGTCTGAT TGACCAAATG GAAGCGGAAG GCATTGTGTC CGCACCGGAA 

3001 CACAACGGCA ACCGTACGAT TCTCGTCCCC TTGGACAATG CTTGA 

This corresponds to the amino acid sequence <SEQ ID 496; ORF58ng-l>: 

1 MFWIVLIVIV LLALAGLFFV RAQS EREWMR EVSAWQEKKG EKQAELPEIK 

51 DGMPDFPEFS LM L FHAVKT A VYWLFVGW R FCRNYLAHES £PD1?PVPPAS 

101 ANRADVPTAS DGYSDSGNGT EEAETEAAEA AEEEAADTED IATAVIDNRR 

151 IPFDRSIAEG LMQSESKTSP VRPVFKEITL EEATRALSSA ALRETKKRYI 

2 01 DAFEKNGTAV PKVRVSDTPM EGLQIIGLDD PVLQRTYSRM FDADKEAFSE 
251 SADYGFEPYF EKQHPSAFSA VKAENARNAP FRRHAGQEKG QAEAKSPDVS 

3 01 QGQSVSDGTA VRDARRRVSV NLKEPNKATV SAEARISRLI PESRTWGKR 
351 DVEMPSETEN VFTETVSSVG YGGPVYDEAA DIHIEEPAAP DAWWEPPEV 

4 01 PEVAVPEIDI LPPPPVSEIY NRTYEPPAGF EQAQRSRIAE TDHLAADVLN 
4 51 GGWQEETAAI ADDGSEGAAE RSSGQYLSET EAFGHDSQAV CPFEDVPSER 
501 PSCRVSDTEA DEGAFQSEET GAVSEHLPTT DLLLPPLFNP EATQTEEELL 
551 ENSITIEEKL AEFKVKVKW DSYSGPVITR YEIEPDVGVR GNSVLNLEKD 
601 LARSLGVASI RWETIPGKT CMGLELPNPK RQMIRLSEIF NSPEFAESKS 
651 KLTLALGQDI TGQPWTDLG KAPHLLVAGT TGSGKSVGVN AMILSMLFKA 
701 APEDVRMIMI DPKMLELSIY EGITHLLAPV VTDMKLAANA LNWCVNEMEK 
751 RYRLMSFMGV RNLAGFNQKI AEAAARGEKI GNPFSLTPDD PEPLEKLPFT 
801 VVWDEFADL MMT AGKKIEE LIARLAQKAR AAGIHLILAT QRPSVDVITG 
851 LIKANIPTRI AFQVSSKIDS RTILDQMGAE NLLGQGDMLF LPPGTAYPQR 
901 VHGAFASDEE VHRVVEYLKQ FGEPDYVDDI LSGGGSEELP GIGRSGDGET 
951 DPMYDEAVSV VLKTRKASIS GVQRALRIGY NRAARLIDQM EAEGIVSAPE 

10 01 HNGNRTILVP LDNA* 

ORF58ng-l and ORF58-1 show 97.2% identity in 1014 aa overlap: 

10 20 30 40 50 60 

orf 58-1. pep MFWIVLIVILLLALAGLFFVRAQSEREWMREVSAWQEKKGEKQAELPEIKDGMPDFPELA 

orf58ng-l MFWIVLIVIVLLALAGLFFVRAQSEREWMREVSAWQEKKGEKQAELPEIKDGMPD^ 

!0 20 30 40 50 60 



70 80 90 100 110 120 

orf 58-1 . pep LML FHAVKTAVYWLFVGWRFCRNYLAHESEPDRPVPPASANRADVPTASDGYSDSGNGT 
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or f 5 finer- 1 LMLFHAVKTAVYWLFVGVVRFCRNYLAHE SE PDRPVP PASANRADVPTAS DGYSDSGNGT 
3r " y 70 go 90 100 110 120 

130 140 150 160 170 180 

or f 58-1 pep EEAETEEAEAAEEEAADTEDIATAVI DNRRIPFDRS IAEGLMPSE SE IS PVRPVFKEITL 
| | | | | | | M | | | | I I i I I I 1 I I I M I 1 M I M 1 I I I I I I I I Ml: I I I I I I I I I I I I 
nrfRRnrr-1 EEAETE AAEAAEEEAADTED I AT AVI DNRRIPFDRS I AEGLMQSE SKTS PVRPVFKE ITL 
° r " S 9 130 140 150 160 170 180 

190 200 210 220 230 240 

orf58-l Pep EEATRALNSAALRETKKRYIDAFEKNETAVPKVRVSDTPMEGLQIIGLDDPVLQRTYSHM 

I i i j i ; I : I s 1 1 1 1 I m ! 1 1 1 1 : 1 1 1 iiiiiiiiiii ni : i 

orf58nq-l EEATRALSSAALRETKKRYIDAFEKNGTAVPKVRVSDTPMEGLQIIGLDDPVLQRTYSRM 
190 200 210 220 230 240 

250 260 270 280 290 300 

orf58-l pep FDADKEAFSESADYGFEPYFEKQHPSAFSAVKAENARNAPFHRHAGQGKGQAEAKSPDVS 
| | | | | | I I I I I M I I I I I I I I II I I I I I I I i I I I I I I II I I : I I I ! I I I II I I I I I II I 
orf58ng-l FDADKEAFSESADYGFEPYFEKQHPSAFSAVKAENARNAPFRRHAGQEKGQAEAKSPDVS 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf58-l pep QGQSVSDGTAVRDARRRVSVNLKEPNKATVSAEARISRLIPESQTWGKRDVEMPSETEN 
| | M | || || | I I I I I 1 I I I I I I I I I I II I I 1 I II I I I I I I I I I = I I I I M II I I II I I I ! 
orf58ng-l QGQSVSDGTAVRDARRRVSVNLKEPNKATVSAEARISRLIPESRTVVGKRDVEMPSETEN 
310 320 330 340 350 360 

370 380 390 400 410 420 

orf58-l pep VFTETVSSVGYGGPVYDETADIHIEEPAAPDAWWEPPEVPKVPMTAIDIQPPPPVSEIY 
| | | | | | I I I I II I I I I I I : I I I I M I I I I I I II I 1 I I I I I I : I : III I I I I II I I I 
orf58ng-l VFTETVSSVGYGGPVYDEAADIHIEEPAAPDAWWEPPEVPEVAVPEIDILPPPPVSEIY 
370 380 390 400 410 420 

430 440 450 460 470 480 

orf58-l pep NRTYEPPSGFEQVQRSRIAETDHLADDVLNGGWQEETAAIADDGSEGAAERSSGQYLSET 
I | | | | | I : I I II : I I I I I II I I I I I I I I I I I I I I I I 1 I M II I I I II II I I M II I I M 
orf58ng-l NRTYEPPAGFEQAQRSRIAETDHLAADVLNGGWQEETAAIADDGSEGAAERSSGQYLSET 

430 440 450 460 470 480 

490 500 510 520 530 540 

orf 58-1. pep EAFGHDSQAVCPFENVPSERPSCRVSDTEADEGAFPSEETGAVSEHLPTTDLLLPPLFNP 
| | | | | || I I I I I I I : I I I I I I 1 II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I M 
orf58ng-l EAFGHDSQAVCPFEDVPSERPSCRVSDTEADEGAFQSEETGAVSEHLPTTDLLLPPLFNP 
490 500 510 520 530 540 

550 560 570 580 590 600 

orf 58-1. pep EATQTEEELLENSITIEEKLAEFKVKVKWDSYSGPVITRYEIEPDVGVRGNSVLNLEKD 
I I I I I II I I I I I I I I I I I I I I I M II I I II II I I I I I I I I 1 I I I I I M I I I I I I I I I I I I 
orf58ng-l EATQTEEELLENSITIEEKLAEFKVKVBCWDSYSGPVITRYEIEPDVGVRGNSVLNLEKD 

550 560 570 580 590 600 

610 620 630 640 650 660 

orf 58-1. pep LARSLGVASIRWETIPGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDI 
I I I I I I I I II I I I I I I II I I 1 I I II I M M I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
orf58ng-l LARSLGVASIRVVETIPGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDI 

610 620 630 640 650 660 

670 680 690 700 710 720 

orf 58-1 . pep TGQPVVTDLGKAPHLLVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIY 

I I I I I 1 I I II I I I I I I I I I I I I II I I 1 I 1 I I II I I I I I I I I I I II I I I I I I I I I I 

orf58ng-l TGQPVVTDLGKAPHLLVAGTTGSGKSVGVN7AMILSMLFKAAPEDVRMIMIDPKMLELSIY 

670 680 690 700 710 720 

730 740 750 760 770 780 

orf 58-1 . pep EGIPHLLAPVVTDMKLAANALNWCVNEMEKRYRLMSFMGVRNLAGFNQKIAEAAARGEKI 
III I I I I I I I I I I I I I I I II I I I I I I I I I I I I I II M I I I I I I I I I I I I II I I I I I I I I 
orf58ng-l EGITHLLAPVVTDMKLAANALNWCVNEMEKRYRLMSFMGVRNLAGFNQKIAEAAARGEKI 

730 740 750 760 770 780 



790 



800 



810 



820 



830 



840 
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GNPFSLTPDDPEPLEKLPFIVWVDEFADLMMTAGKKIEELIARLAQKARAAGIHLILAT 
I I I I I I I I i I I I I I I I I I I I M M 1 1 I I I I 1 I i I I I 1 I I I I I I I I I I 1 I 1 I I I I I I I I I I 
GNPFSLTPDDPEPLEKLPFIWWDEFADLMMTAGKKIEELIARLAQKA.RAAGIHLILAT 
790 800 810 820 830 840 

850 860 870 880 890 900 

QRPSVDVITGLIKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLLPGTAYPQR 
M II I I i I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I M I 
QRPSVDVITGLIKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLPPGTAYPQR 

850 860 870 880 890 900 



910 920 930 940 950 960 

orf 58-1. pep VHGAFASDEEVHRWEYLKQFGEPDYVDDILSGGGSEELPGIGRSGDDETDPMYDEAVSV 
I II I II I II II I I I I I I I II I I I M I I I II I I I I II II II II II II I I M I I I I I II II 
1 5 orf 58ng-l VHGAFASDEEVHRVVEYLKQFGEPDYVDDILSGGGSEELPGIGRSGDGETDPMYDEAVSV 

910 920 930 940 950 960 



970 980 990 1000 1010 

VLKTRKAS I S GVQRALR I G YNRAARL I DQME AEG I VS APE HNGNRT ILVPL DNAX 
II II II II II I I I I II II I I I M I M II M I I I I II I I II M I I II II I I I I I II 
VLKTRKAS I SGVQRALRIGYNRAARLI DQME AEG IVSAPEHNGNRTILVPLDNAX 

970 980 990 1000 1010 



Furthermore, ORF58ng-l shows significant homology to the E.coli protein FtsK: 

sp| P4 6889 |FTSK_ECOLI CELL DIVISION PROTEIN FTSK >gi | 1651412 | gnl | PID | dl015290 (Dl 
25 division protein FtsK [Escherichia coli] >gi | 1651418 I gnl | PID | dl015296 (D90727) Cell 

division protein FtsK [Escherichia coli] >gi 11787117 (AE000191) cell division 
protein FtsK [Escherichia coli] Length = 1329 
Score = 576 bits (1469), Expect = e-163 

Identities = 301/459 (65%), Positives = 353/459 (76%), Gaps = 5/459 (1%) 





Query: 


556 


IEEKLAE FKVKVKVVDSYSGPVITRYEIEPDVGVRGNSVLNLEKDLARSLGVASIRWET 


615 








+E +LA+F++K VV+ GPVITR+E+ GV+ + NL +DLARSL ++RWE 






Sbjct: 


868 


VEARLADFRIKADVVNYSPGPVITRFELNLAPGVKAARISNLSRDLARSLSTVAVRVVEV 


927 


35 


Query: 


616 


IPGKTCMGLELPNPKRQMIRLSEIFNSPEFAESKSKLTLALGQDITGQPWTDLGKAPHL 


67 5 








IPGK +GLELPN KRQ + L E+ ++ +F ++ S LT+ LG+DI G+PW DL K PHL 






Sbjct : 


928 


IPGKPYVGLELPNKKRQTVYLREVLDNAKFRDNPSPLTWLGKDIAGEPWADLAKMPHL 


987 


40 


Query: 


676 


LVAGTTGSGKSVGVNAMILSMLFKAAPEDVRMIMIDPKMLELSIYEGITHLLAPVVTDMK 


735 






LVAGTTGSGKSVGVNAMILSML+KA PEDVR IMIDPKMLELS+YEGI HLL WTDMK 






Sbjct: 


988 


LVAGTTGSGKSVGVNAMILSMLYKAQPEDVRFIMIDPKMLELSVYEGIPHLLTEWTDMK 


1047 






736 


LAANALNWCVNEMEKRYRLMSFMGVRNLAGFNQKIAEAAARGEKIGNPFSLTPDDPEP— 


793 


45 






AANAL WCVNEME+RY+LMS +GVRNLAG+N+KIAEA I +P+ D + 




Sb j ct : 


1048 


DAANALRWCVNEMERRYKLMSALGVRNLAGYNEKIAEADRMMRP I PDP YWKPGDSMDAQH 


1107 




Query: 


794 


— LEKLPFIVWVDEFADLMMTAGKKIEELIARLAQKARAAGIHLILATQRPSVDVITGL 


851 








L+K P+IW+VDEFADLMMT GKK+EELIARLAQKARAAGIHL+LATQRPSVDVITGL 




50 


Sbjct: 


1108 


PVLKKEPYIWLVDEFADLMMTVGKKVEELIARLAQKARAAGIHLVLATQRPSVDVITGL 


1167 




852 


IKANIPTRIAFQVSSKIDSRTILDQMGAENLLGQGDMLFLPPGTAYPQRVHGAFASDEEV 


911 








IKANIPTRIAF VSSKIDSRTILDQ GAE+LLG GDML+ P + P RVHGAF D+EV 






Sbjct: 


1168 


IKANIPTRIAFTVSSKIDSRTILDQAGAESLLGMGDMLYSGPNSTLPVRVHGAFVRDQEV 


1227 


55 




912 


HRWEYLKQFGEPDYVDDILSGGGSEELPGIGRSGDGETDPMYDEAVSWLKTRKASISG 


971 








H W+ K G P YVD IS SE G G G E DP++D+AV V + RKASISG 






Sbjct: 


1228 


HAVVQDWKARGRPQYVDGITSDSESEGGAG-GFDGAEELDPLFDQAVQFVTEKRKASISG 


1286 


60 


Query: 


972 


VQRALRIGYNRAARLIDQMEAEGIVSAPEHNGNRTILVP 1010 








VQR RIGYNRAAR+I+QMEA+GIVS HNGNR +L P 






Sbjct: 


1287 


VQRQFRIGYNRAARIIEQMEAQGIVSEQGHNGNREVLAP 1325 





Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 
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Example 59 

The following partial DNA sequence was identified in JV. meningitidis <SEQ ID 497>: 

1 ATGATTTATC AAAGAAACCT CATCAAAGAA CTCTCTTTTA CCGCCGTCGG 

51 CATTTTCGTC GTCCTCTTGG CGGTATTGGT CTCCACGCAG GCAATCAACC 

101 TGCTCGGCCG TGCCGCCGAC GGGC..GTGA TCGCCATCGA TGCCGTGTTG 

151 GCATTGGTCG GCTTCTGGGT C 

// 

901 A TTGCCATCGG TTTGTTTTTA ATTTACCAAA ACGGGCTGAC 

951 CCTGCTTTTT GAAGCCGTGG AAGACGGCAA AATCCATTTT TGGCTCGGAC 

1001 TGCTGCCTAT GCACATTATC ATGTTTGTCC TTGCACTCAT CCTGTTGCGC 

1051 GTCCGCAGTA TGCCCAGCCA GCCCTTCTGG CAGGCGGTTG GCAAAAGTCT 

1101 GACATTGAAA GGCGGAAAAT GA 

This corresponds to the amino acid sequence <SEQ ID 498; ORF101>: 



Further work revealed the complete nucleotide sequence <SEQ ID 499>: 

20 1 ATGATTTATC AAAGAAACCT CATCAAAGAA CTCTCTTTTA CCGCCGTCGG 

51 CATTTTCGTC GTCCTCTTGG CGGTATTGGT CTCCACGCAG GCAATCAACC 

101 TGCTCGGCCG TGCCGCCGAC GGGCGTGTCG CCATCGATGC CGTGTTGGCA 

151 TTGGTCGGCT TCTGGGTCAT CGGTATGACG CCGCTTTTGC TGGTGTTGAC 

201 CGCATTTATC AGTACGTTGA CCGTGTTGAC CCGCTACTGG CGCGACAGCG 

25 251 AAATGTCGGT CTGGCTATCC TGCGGATTGG CATTGAAACA ATGGATACGC 

301 CCGGTGATGC AGTTTGCCGT GCCGTTTGCC GTTTTGGTTG CCGTCATGCA 

351 GCTTTGGGTG ATACCGT GGG CAGAGCTACG CAGCCGCGAA TACGCTGAAA 

401 TCCTGAAGCA GAAGCAGGAA TTGTCTTTGG TGGAGGCAGG CGAGTTCAAC 

4 51 AGTTTGGGCA AGCGCAACGG CAGGGTTTAT TTTGTCGAAA CCTTCGATAC 

30 501 CGAATCCGGC ATCATGAAAA ACCTGTTCCT GCGCGAACAG GACAAAAACG 

551 GCGGCGACAA CATCATCTTC GCCAAAGAAG GTAACTTCTC GCTGAACGAC 

601 AACAAACGCA CGCTCGAATT GCGCCACGGC TACCGTTACA GCGGCACGCC 

651 CGGACGCGCC GACTACAATC AGGTTTCCTT CCAAAAACTC AACCTGATTA 

7 01 TCAGCACCAC GCCCAAACTC ATCGACCCCG TTTCCCACCG CCGTACCATT 
35 7 51 CCGACCGCCC AACTGATTGG CAGCAGCAAC CCGCAACATC AGGCGGAATT 

801 GATGTGGCGC ATCTCGCTGA CCGTCAGCGT CCTCCTACTC TGCCTGCTTG 

8 51 CCGTGCCGCT TTCCTATTTC AACCCGCGCA GCGGACATAC CTACAAT AT C 
901 TTGATTGCCA TCGGTTTGTT TTTAATTTAC CAAAACGGGC TGACCCTGCT 
951 TTTTGAAGCC GTGGAAGACG GCAAAATCCA TTTTTGGCTC GGACTGCTGC 

40 1001 CTATGCACAT TATCATGTTT GCCGTTGCAC TCATCCTGTT GCGCGTCCGC 

1051 AGTATGCCCA GCCAGCCCTT CTGGCAGGCG GTTGGCAAAA GTCTGACATT 
1101 GAAAGGCGGA AAATGA 

This corresponds to the amino acid sequence <SEQ ID 500; ORF101-1>: 

1 MIYQRNLIKE LSFTAVGIFV VLLAVLVSTQ A INLLGRAAD GRVAI DAVLA 

45 51 LVGFWVIGMT PLLL VLTAFI STLTVLTRYW RDSEMSVWLS CGLALKQWIR 

101 PVMQ FAVPFA VLVAVMQLWV I PWAELRSRE YAEILKQKQE LSLVEAGEFN 

151 SLGKRNGRVY FVETFDTESG IMKNLFLREQ DKNGGDNIIF AKEGNFSLND 

201 NKRTLELRHG YRYSGTPGRA DYNQVSFQKL NLIISTTPKL IDPVSHRRTI 

251 PTAQLIGSSN PQHQAELMWR ISLTVSVLLL CLLAVPL SYF NPRSGHTYNI 

50 3 01 LIAIGLFLIY QNGLTL LFEA VEDGKIHFWL GLLPMHIIMF AVAL I LL RVR 

351 SMPSQPFWQA VGKSLTLKGG K* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF101 shows 91.2% identity over a 57aa overlap and 95.7% identity over a 69aa overlap with 
55 an ORF (ORFlOla) from strain A ofN. meningitidis: 
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10 20 30 40 50 

orflOl.pep MI YQRNL I KE L S FT AVG I FWLLAVLVS TQAINLLGRAADGXV I AI DAVLALVGFWVX 
I I I 1 1 I I I I I I I I M I I M I I II I I I II M I I I I I I III MINIMUM!! 
orflOla MIYQRNLIKELSFTAVGIFVVLLAVLVSTQAINLLGXAADXRX-AIDAVLALVGFWVXXM 
10 20 30 40 50 

// 

90 100 110 

orf 101 .pep IAIGLFLIYQNGLTLLFEAVEDGKIHFWLGL 

M I 1 1 I 1 I I I I I I I I I I I I I M I M I 1 II I 
orf 101a LTVSVLLLCLLAVPLSYFNPRSGHTYNILXAIGLFLIYQNGLTLLFEAVEDGKIHFWLGL 
280 290 300 310 320 330 



120 130 140 150 

orf 101 .pep LPMHIIMFVLALILLRVRSMPSQPFWQAVGKSLTLKGGKX 
II I I I I II I : I : : I I II I I I I I I M I I 1 M I I I I I I I I I I 
orf 101a LPMHIIMFVIAIVLLRVRSMPSQPFWQAVGKSLTLKGGKX 
340 350 360 370 

The complete length ORF101 a nucleotide sequence <SEQ ID 501> is: 



1 ATGATTTATC AAAGAAACCT CAT CAAAGAA CTCTCTTTTA CCGCCGTCGG 

51 CATTTTCGTC GTCCTCTTGG CGGTATTGGT CTCCACGCAG GCAATCAACC 

101 TGCTCGGCCN TGCCGCCGAC NGGCGTNTCG CCATCGATGC CGTGTTGGCA 

151 TTGGTCGGCT TCTGGGTCNN NNGNATGACG CCGCTTTTGC TNGTGTTGAC 

201 CGCATTTATC AGTACGTTGA CCGTGTTGAC CCGCTACTGG CGNGACAGCG 

2 51 AAATGTCGGT CTGGNTATCC TGCGGATTGG CATTGAAACA ATGGATACGC 

3 01 CCGGT GATGC AGTTTGCCGT GCCGTTTGCC GTTTTGGTTG CCGTCATGCA 
351 GCTTTGGGTG ATACCGTGGG CAGAGCTACG CAGCCGCGAA TACGCTGAAA 

4 01 TCCTGAAGCA GAAGCAGGAA TTGTCTTTGG TGGAGGCAGG CGGGTTCAAC 
4 51 AGTTTGGGCA AGCGCAACGG CAGGGTTTAT TTTGTCGAAA CCTTCGATAC 
501 CGAATCCGGC ATCATGAAAA ACCTGTTCCT GCGCGAACAG GACAAAAACG 
551 GCGGCGACAA CATCATCTTC NCCAAAGAAA GTAACTTCTC GCTGAACGAC 
601 AACAAACGCA CGCTCGAATT GCGCCACGGC TACCGTTACA GCGGCACGCC 
651 CGGACGCGCC GACTACAATC AGGTTTCCTT CCNAAAACTC AACCTGATTA 
7 01 TCAGCACCAC GCCCAAACTC ATCGACCCCG TTTCCCACCG CCGTACNATN 
7 51 CCNACNGCCC AACTGATTGG CAGCAGCAAC CCGCAACATC ANGCGGAATT 
801 GATGTGGCGC ATCTCGCTGA CCGTCAGCGT CCTCCTACTC TGCCTGCTTG 
851 CCGTGCCGCT TTCCTATTTC AACCCGCGCA GCGGACATAC CTACAATATC 
901 TTGANTGCCA TCGGTTTGTT TTTAATTTAC CAAAACGGGC TGACCCTGCT 
951 TTTTGAAGCC GTGGAAGACG GCAAAATCCA TTTTTGGCTC GGACTGCTGC 

1001 CTATGCACAT CATCATGTTC GTCATCGCAA TCGTACTTCT GCGCGTCCGC 

1051 AGCATGCCCA GCCAGCCCTT CTGGCAGGCG GTTGGCAAAA GTCTGACATT 

1101 GAAAGGCGGA AAATGA 

This encodes a protein having amino acid sequence <SEQ ID 502>: 



1 MIYQRNLIKE LSFTAVGIFV VLLAVLVSTQ A INLLGXAAD XRXAI DAVLA 

51 LVGFWVXXMT PLLLV LTAFI STLTVLTRYW RDSEMSVWXS CGLALKQWIR 

101 PVMQ FAVPFA VLVAVMQLWV I PWAELR5RE YAEILKQKQE LSLVEAGGFN 

151 SLGKRNGRVY FVETFDTESG IMKNLFLREQ DKNGGDNIIF XKESNFSLND 

2 01 NKRTLELRHG YRYSGTPGRA DYNQVSFXKL NLIISTTPKL IDPVSHRRTX 
251 PTAQLIGSSN PQHXAE LMWR ISLTVSVLLL CLLAVPL SYF NPRSGHTYNI 

3 01 LXAIGLFLIY QNGLTL LFEA VEDGKIHFWL GLLPMHIIMF VIAIVLL RVR 
351 SMPSQPFWQA VGKSLTLKGG K* 

ORFlOla and ORF101-1 show 95.4% identity in 371 aa overlap: 

orf 101a . pep MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGXAADXRXAIDAVLALVGFWVXXMT 60 

I I I I I I I I I I M I I I I I I I I I I I I I I I I I M I I I I I Ml I I I I I I I I I I I I I I II 
orf 101-1 MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGRAADGRVAIDAVLALVGFWIGMT 60 

orf 101a .pep PLLLVLTAFI STLTVLTRYWRDSEMSVWXSCGLALKQWIRPVMQFAVPFAVLVAVMQLWV 120 

1 ! 1 I I I I I M I I I I I I I I I I I I I I I I I I I II I I II II I I II II I I I I I I II 

orf 101-1 PLLLVLTAFI STLTVLTRYWRDSEMSVWLSCGLALKQWIRPVMQFAVPFAVLVAVMQLWV 120 

orflOla.pep I PWAE LRSREYAE I LKQKQE L S LVE AGGFN S LGKRNGRVYFVE TFDTESGI MKNLFLRE Q 180 

I I M M M I I I M II I I I I M I I I I I I I M I I I I I I I I I I I I I I I M I I I I I I I II I II 
orf 101-1 IPWAELRSREYAEILKQKQELSLVEAGEFNSLGKRNGRVYFVETFDTESGIMKNLFLREQ 180 
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orflOla.pep DKNGGDNIIFXKESNFSLNDNKRTLELRHGYRYSGTPGRADYNQVSFXKLNLIISTTPKL 24 0 

M I I I I I I I I I I : I I I I I 1 I I 1 I I I I I I I 1 I I I 1 1 11 I 1 I I I 1 I 1 I I I I I I I I I I I I I 
or f 10 1-1 DKNGGDNIIFAKEGNFSLNDNKRTLELRHGYRYSGTPGRADYNQVSFQKLNLIISTTPKL 2 40 

orflOla.pep IDPVSHRRTXPTAQLIGSSNPQHXAELMWRISLTVSVLLLCLLAVPLSYFNPRSGHTYNI 300 

] I ] I I I I I I I I I I I II I I I I M II i I II 1 I I I II I I I II I I I I I I I I I I I M 1 I II M 
or f 10 1-1 IDPVSHRRTIPTAQLIGSSNPQHQAELMWRISLTVSVLLLCLLAVPLSYFNPRSGHTYNI 300 

orflOla.pep LXAIGLFLIYQNGLTLLFEAVEDGKIHFWLGLLPMHIIMFVIAIVLLRVRSMPSQPFWQA 3 60 

I II I I I I I I M M M M M I I II I M I 1 I II II II M II : : I : : I II II I M I I I II II 
orf 101-1 LIAIGLFLIYQNGLTLLFEAVEDGKIHFWLGLLPMHIIMFAVALILLRVRSMPSQPFWQA 3 60 

orflOla.pep VGKSLTLKGGK 371 

I I I II I I II I I 
orfl01-l VGKSLTLKGGK 371 

Homology with a predicted ORF from N. gonorrhoeae 

ORF101 shows 96.5 % identity in 57aa overlap at the N-terminal domain and 95.1% identity in 
61aa overlap at the C-terminal domain, respectively, with a predicted ORF (ORFlOlng) from N. 
gonorrhoeae: 

orf 101 .pep MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGRAADGXVIAIDAVLALVGFWV 57 

I I I 1 I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II 
orf lOlng MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGRAADGRV-AIDAVLALVGFWVIGM 5 9 



orf 101. pep IAIGLFLIYQNGLTLLFEAVEDGKIHFWLG 333 

I ! i I II II II II II II II II I I II I I I I ] I 
orflOlng SLTVSVLLLCLLAVPLSYFNPRSGHTYNILIAIGLFLIYQNGLTLLFEAVEDGKIHFWLG 331 

orf 101. pep LLPMHIIMFVLALILLRVRSMPSQPFWQAVGKSLTLKGGK 373 

II I i I M I II : I : : I I I II I I I I I I I I II I I 
orflOlng LLPMHIIMFVIAIVLLRVRSMPSQPFWQAVG 3 62 

The ORFlOlng nucleotide sequence <SEQ ID 503> is predicted to encode a protein having partial 
amino acid sequence <SEQ ID 504>: 

1 MIYQRNLIKE LSFTAVGIFV V LLAVLVSTQ AINLLGRAAD GRVAI DAVLA 

51 LVGFWVIGMT PLLL VLTAFI STLTVLTRYW RDSEMSVWLS CGLALKQWIR 

101 PVMQ FAVPFA ILIAVMQLWV I PWAELRSRE YAEILKQKQE LS LVE AGE FN 

151 NLGKRNGRVY FVETFDTESG IMKNL FLREQ DKNGGDNIIF AKEGNFSLKD 

201 NKRTLELRHG YRYSGTPGRA DYNQVSFQKL NLIISTTPKL IDPVSHRRTI 

251 STAQLIGSSN PQHQAELMWR ISLTVSVLLL CLLAVPL SYF NPRSGHTYNI 

301 LIAIGLFLIY QNGLTL LFEA VEDGKIHFWL GLLPMHIIMF VIAIVLL RVR 

351 SMPSQPFWQA VG. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 505>: 

1 ATGATTTATC AAAGAAACCT CAT C AAAGAA CTCTCTTTTA CCGCCGTCGG 

51 CATTTTCGTC GTCCTCTTGG CGGTGTTGGT GTCCACGCAG GCGATCAACC 

101 TGCTTGGCCG CGCAGCTGAC GGGCGTGTCG CCATCGATGC CGTGTTGGCC 

151 TTAGTCGGCT TCTGGGTCAT CGGTATGACC CCGCTTTTGC TGGTGTTGAC 

201 CGCATTCATC AGCACGCTGA CCGTATTGAC CCGCTACTGG CGCGACAGCG 

251 AAATGTCGGT CTGGCTATCC TGCGGATTGG CGTTGAAACA GTGGATACGC 

301 CCCGTCATGC AGTTTGCCGT GCCGTTTGCC ATCCTGATTG CCGTCATGCA 

351 GCTTTGGGTG ATACCGTGGG CAGAGCTGCG CAGCCGCGAA TATGCCGAAA 

401 TTTTGAAGCA GAAGCAGGAA TTGTCTTTGG TGGAAGCCGG CGAGTTCAAT 

4 51 AACTTGGGCA AGCGCAACGG CAgggtttaT TtcgtcgaaA CCTTTGACAC 

501 CGaatccgGC ATCATGAAAA ACCTGTtcct GcGCGAACAG GACAAAAACG 

551 gcggcgacaA CATCATCTTC GCcaaaGAag gtaactTctc gctgaaggaC 

601 AACAAAcgca cgctcgaATT GCGCCACGGC TACCGTTACA GCGGcacgcC 

651 CGGacGCGCc gactaCAATC AGGTTtcctt cCAAAAacTc aacctgATta 

701 TCAGCACCAC GCCCAAacTT ATCGaccCCG TTTCCCACCG CCGCACCATT 



CHIR-0160 (356.001) 



-315- 



PATENT 



7 51 tcgacCGCCC AAcTGATTGG CAGCAGCAAT CCGCAACATC AGGCAGAATT 

801 GATGTGGCGC ATCTCGCTGA CCGTCAGCGT CCTCCTGCTC TGCCTACTCG 

851 CCGTGCCGCT TTCCTATTTC AACCCGCGCA GCGGACATAC CTACAATATC 

901 TTGATTGCCA TCGGTTTGTT TTTAATTTAC CAAAACGGGC TGACCCTGCT 

951 TTTTGAAGCC GTGGAAGACG GCAAAATCCA TTTTTGGCTC GGACTGCTGC 

1001 CTATGCACAT CATCATGTTC GTCATCGCAA TCGTACTTCT GCGCGTCCGC 

1051 AGTATGCCCA GCCAGCCCTT CTGGCAGGCG GTTGGCAAAA GTCTGACATT 

1101 GAAAGgcgGA AAATGA 

This corresponds to the amino acid sequence <SEQ ID 506; ORF101ng-l>: 

1 MIYQRHLIKE LSFTAVGIFV VLLAVLVSTQ A INLLGRAAD GRVAIDAVLA 

51 LVGFWVIGMT PLLLV LTAFI STLTVLTRYW RDSEMSVWLS CGLALKQWIR 

101 PVMQ FAVPFA ILIAVMQLWV I PWAELRSRE YAEILKQKQE LSLVEAGEFN 

151 NLGKRNGRVY FVETFDTESG IMKNLFLREQ DKNGGDNIIF AKEGNFSLKD 

201 NKRTLELRHG YRYSGTPGRA DYNQVSFQKL NLIISTTPKL IDPVSHRRTI 

251 STAQLIGSSN PQHQAELMWR ISLTVSVLLL CLLAVPL SYF NPRSGHTYNI 

3 01 LIAIGLFLIY QNGLTL LFEA VEDGKIHFWL GLLPMHIIMF VIAIVLL RVR 

351 SMPSQPFWQA VGKSLTLKGG K* 

ORFlOlng-1 and ORF101-1 show 97.6% identity in 371 aa overlap: 



orf 101-1. pep 



orf 101-1. pep 
orflOlng-1 



orf 101-1 .pep 



orfl01-l .pep 



rf 101-1 .pep 
rflOlng-1 



orflOl-l.pep 
orflOlng-1 



MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGRAADGRVAIDAVLALVGFWVIGMT 
I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I M I I I I I I I I I I I I M I I II M I I I I II I 
MIYQRNLIKELSFTAVGIFWLLAVLVSTQAINLLGRAADGRVAIDAVLALVGFWVIGMT 



10 



70 



20 



30 



40 



50 



60 



80 90 100 110 120 

PLLLVLTAFISTLTVLTRYWRDSEMSVWLSCGLALKQWIRPVMQFAVPFAVLVAVMQLWV 

M I I I 1 I I I I I I I I I I I I I I I I II I I I I I I [ I I I I I I I f I I I I I : I : I I M I I I 

PLLLVLTAFI STLTVLFRYWRDSEMSVWLSCGLALKQWIRPVMQFAVPFAILIAVMQLWV 
70 80 90 100 110 120 

130 140 150 160 170 180 

IPWAELRSREYAEILKQKQELSLVEAGEFNSLGKRNGRVYFVETFDTESGIMKNLFLREQ 
N I I I I I I I I II I I I I I II I I II I I I I I I I : II I I I I I I I I | | | | | | | || | | | | || | | | | 
IPWAELRSREYAEILKQKQELSLVEAGEFNNLGKRNGRVYFVETFDTESGIMKNLFLREQ 

130 140 150 160 170 180 

190 200 210 220 230 240 

DKNGGDNIIFAKEGNFSLNDNKRTLELRHGYRYSGTPGRADYNQVSFQKLNLIISTTPKL 
I I I I I I I I I II I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I II ! I II I I I I I I I I I I I I 
DKNGGDNIIFAKEGNFSLKDNKRTLELRHGYRYSGTPGRADYNQVSFQKLNLIISTTPKL 

190 200 210 220 230 240 

250 260 270 280 290 300 

IDPVSHRRTIPTAQLIGSSNPQHQAELMWRISLTVSVLLLCLLAVPLSYFNPRSGHTYNI 
M N I I II II I II II II I I I I II II || I I | | | | | | | | | | | | | | | | | | | | || | | | | | | | | 
IDPVSHRRTISTAQLIGSSNPQHQAELMWRISLTVSVLLLCLLAVPLSYFNPRSGHTYNI 
250 260 270 280 290 300 

310 320 330 340 350 360 

LIAIGLFLIYQNGLTLLFEAVEDGKIHFWLGLLPMHIIMFAVALILLRVRSMPSQPFWQA 

I I 1 I I I I I I I I I I I I I I I I I I I Ill I I I I :: I :: I 1 I I I M I I I I I I I I 

LIAIGLFLIYQNGLTLLFEAVEDGKIHFWLGLLPMHIIMFVIAIVLLRVRSMPSQPFWQA 

310 320 330 340 350 360 

370 

VGKS LT LKGGKX 
I'll!!;; 

VGKSLT LKGGKX 
370 



60 Based on this analysis, including the presence of a putative leader sequence (double-underlined) 
and several putative transmembrane domains (single-underlined) in the gonococcal protein, it is 
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predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 60 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 507>: 

5 1 . . GGTGGTGGTT TTATCAATGC TTCCTGTGCC ACTTTGACGA CAGCCAAACC 

51 GCAATATCAA GCAGGAGACC TTAGCGCTTT TAAGATAAGG CAAGGCAATG 

101 TTGTAATCGC CGGACACGGT TTGGATGCAC GTGATACCGA TTACACACGT 

151 ATTCTCAGTT ATCATTCCAA AATCGATGCA CCCGTATGGG GACAAGATGT 

201 TCGTGTCGTC GCGGGACAAA ACGATGTGGC CGCAACAGGT GATGCACATT 

10 251 CGCCTATTCT CAATAATGCT GCTGCCAATA CGTCAAACAA TACAGCCAAC 

301 AACGGCACAC ATATCCCTTT ATTTGCGATT GATACAGGCA AATTAGGAGG 

351 TAT.GTATGC CAACAAAATC ACCTTGATCA GTACGGTCGA GCAAGCAGGC 

401 ATTCGTAA 

This corresponds to the amino acid sequence <SEQ ID 508; ORF1 13>: 

15 1 . . GGGFINASCA TLTTAKPQYQ AGDLSAFKIR QGNWIAGHG LDARDTDYTR 

51 ILSYHSKIDA PWGQDVRW AGQNDVAATG DAHSPILNNA AANTSNNTAN 

101 NGTHIPLFAI DTGKLGGXVC QQNHLDQYGR ASRHS* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with with pspA putative secreted protein of N. meningitidis (accession AF030941) 
20 ORF and pspA show 44% aa identity in 1 79aa overlap: 

orfll3 GGGFINASCATLTTAKPQYQAGDLSAFKIRQGNWIAGHGLDARDTDYTRILSYHSKIDA 60 

GGG INA+ TLT+ P G+L+ F + G WI G GLD D DYTRILS ++I+A 
pspa GGGLINAASVTITSGVPVLNNGNLTGFDVSSGKWIGGKGLDTSDADYTRILSRAAEINA 256 

25 orfll3 PVWGQDVRWAGQNDVAATGDAHSPILXXXXXXXXXXXXXXGTHIPLFAIDTGKLGGMYA 120 
VWG+DV+W+G+N + G + P AIDT LGGMYA 

pspa GVWGKDVKWSGKNKLDFDG SLAKTASAPSSSDSVTPTVAIDTATLGGMYA 307 

orfll3 . NKITLISTVEQAGIRNQGQWFASAGNVAVNAEGKLVNTGMIAATGENHAVSLHARNVHN 179 
30 +KITLIST A IRN+G+ FA+ G V ++A+GKL N+G I A +++ A+ V N 

pspa DKITLISTDNGAVIRNKGRIFAATGGVTLSADGKLSNSGSIDAA EITISAQTVDN 362 

Homology with a predicted ORF from N .gonorrhoeae 

ORF1 13 shows 86.5% identity in 52aa overlap at the N- terminal part and 94.1% identity in 17aa 
35 overlap at the C-terminal part with a predicted ORF (ORF1 13ng) from N. gonorrhoeae: 

orfll3 GGGFINASCATLTTAKPQYQAGDLSAFKIR 30 

11111)11 I I I I I :: I I I I I I I : I : j I I J 
orfll3ng SHPSQLNGYIEVGGRRAEWIANPAGIAVNGGGFINASRATLTTGQPQYQAGDFSGFKIR 224 

40 orfll3 QGNVVIAGHGLDARDTDYTRILSYHSKIDAPVWGQDVRVVAGQNDVAATGDAHSPILNNA 90 

I I I : I 1 I 1 1 I I M I I I I : I I I I 
orfll3ng QGNAVIAGHGLDARDTDFTRILVCQQNHLDQYGRTSRHS 2 63 

orfll3 IDTGKLGGXVCQQNHLDQYGRASRHS 135 

45 111111111111:1111 

orfll3ng DFSGFKIRQGNAVIAGHGLDARDTDFTRILVCQQNHLDQYGRTSRHS 2 63 

The complete length ORF113ng nucleotide sequence <SEQ ID 509> is predicted to encode a 
protein having amino acid sequence <SEQ ID 510>: 
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1 MNKTLYRVIF NRKRGAWAV AETTKREGKS CADSGSGSVY VKSVSFIPTH 
51 SKAFCFSALG FSLCLALGTV NIAFADGIIT DKAAPKTQQA TILQTGNGIP 
101 QVNIQTPTSA GVSVNQYAQF DVGNRGAILN NSRSNTQTQL GGWIQGNPWL 
151 TRGEARVWN QINSSHPSQL NGYIEVGGRR AEWIANPAG IAVNGGGFIN 
201 ASRATLTTGQ PQYQAGDFSG FKIRQGNAVI AGHGLDARDT DFTRILVCQQ 
251 NHLDQYGRTS RHS* 

Based on this analysis, it is predicted that these proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 61 



The following partial DNA sequence was identified in TV. meningitidis <SEQ ID 5 1 1 >: 

1 . . TCAACGGGAC ATAGCGAACA AAATTACACT TTGCCGCGAG AAATCACACG 

51 CAACATTTCA CTGGGTTCAT TTGCCTATGA ATCGCATCGC AAAGCATTAA 

101 GCCATCATGC GCCCAGCCAA GGCACTGAGT TGCCGCAAAG CAACGGTATT 

151 TCGCTACCCT ATACGTCCAA TTCTTTTACC CCATTACCCA GCAGCAGCTT 

201 AT AC AT TAT C AATCCTGTCA ATAAAGGCTA TCTTGTTGAA ACCGATCCAC 

251 GCTTTGCCAA CTACCGTCAA TGGTTGGGTA GTGACTATAT GCtGGACAGC 

301 CTCAAACTAG ACCCAAACAA T T T AC AT AAA CGTTTGGGTG ATGGTTATTA 

351 CGAGCAACGT TTAATCAATG AACAAATCGC AGAGCTGACA GGGCATCGTC 

401 GTTTAGAcGG TTATCAAAAC GACGAAGAAC AATTTAAAGC CTTAATGGAT 

451 AATGGCGCGA CTGCGGCACG TTcGATGAAT CTCAGCGTTG GCATTGCATT 

501 AAGTGCCGAG CAAGTAGCGC AACTGACCAG CGATATTGTT TGGTTGGTAC 

551 AAAAAGAAGT TAAGCTTCCT GATGGCGGCA CACAAACCGT ATTGGTGCCA 

601 CAGGTTTATG TACGCGTTAA AAATGGCGAC ATAGACGGTA AAGGTGCATT 

651 GTTGTCAGGC AGCAATACAC AAATCAATGT TTCAGGCAGC CTGAAAAACT 

701 CAGGCACGAT TGCAGGgCGC AATGCGCTTA TTAT CAATAC CGATACGCTA 

751 GACAATATCG GTGGGCGTAT TCATGCGCAA AAAT CAGCGG TTACGGCCAC 

801 ACAAGACATC AATAATATTG GCGGCATGCT TTCTGCCGAA CAGACATTAT 

851 TGCT CAACGC AGGCAACAAC ATCAACAGCC AAAGCACCAC CGCCAGCAGT 

901 CAAAATACAC AAGGCAGCAG CACCTACCTA GACCGAATGG CAGGTATTTA 

951 TATCACAGGC AAAGAAAAAG GTGTTT . . 

This corresponds to the amino acid sequence <SEQ ID 512; ORF1 15>: 

1 . . STGHSEQNYT LPREITRNIS LGSFAYESHR KALSHHAPSQ GTELPQSNGI 

51 SLPYTSNSFT PLPSSSLYII NPVNKGYLVE TDPRFANYRQ WLGSDYMLDS 

101 LKLDPNNLHK RLGDGYYEQR LINEQIAELT GHRRLDGYQN DEEQFKALMD 

151 NGATAARSMN LSVGIALSAE QVAQLTSDIV WLVQKEVKLP DGGTQTVLVP 

201 QVYVRVKNGD IDGKGALLSG SNTQINVSGS LKNSGTIAGR NALIINTDTL 

251 DNIGGRIHAQ KSAVTATQDI NNIGGMLSAE QTLLLNAGNN INSQSTTASS 

301 QNTQGSSTYL DRMAGIYITG KEKGV . . 

Computer analysis of this amino acid sequence gave the following results: 

Homology with the pspA putative secreted protein of N. meningitidis (accession number AF030941) 
ORF1 15 and pspA protein show 50% aa identity in 325aa overlap: 

Orfll5: 1 STGHSEQNYTLPREITRNISLGSFAYESHRKALSHHAPSQGTELPQSNGI SLPYTSNSFT 60 

STG+S Y E++ +1 +G AY+ + + P + NGI +T 
pspA: 778 STGYSRSPYEPAPEVS - S IRMGI S AYKGYAPQQAS D I PGTWPVVAENGIHPT FT 831 

Orfll5: 61 PLPSSSLYIINPVNKGYLVETDPRFANYRQWLGSDYMLDSLKLDPNNLHKRLGDGYYEQR 120 

LP+SSL+ I P NKGYL+ETDP F +YR+WLGS YML +L+ DPN++HKRLGDGYYEQ+ 
pspA: 832 -LPNSSLFAIAPNNKGYLIETDPAFTDYRKWLGSGYMLAALQQDPNHIHKRLGDGYYEQK 890 

Orfll5: 121 LINEQIAELTGHRRLDGYQNDEEQFKALMDNGATAARSMNLSVGIALSAEQVAQLTSDIV 180 

L+NEQIA+LTG+RRLDGY NDEEQFKALMDNG T A+ + L+ GIALSAEQVA+LTSDIV 
pspA: 891 LVNEQIAKLTGYRRLDGYTNDEEQFKALMDNGITIAKELQLTPGIALSAEQVARLTSDIV 950 
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951 


Orfll5: 


240 




1010 


Orfll5: 


300 




1069 



WL + V LPDG TQTVL P+VYVR ■ 



■ R+AGIY+TG++ G 



D++G+GALLSGS 



SG+++N G IAG 



Homology with a predicted ORF from N .gonorrhoeae 

ORF115 shows 91.9% identity over a 334aa overlap with a predicted ORF (ORF115ng) from 
N. gonorrhoeae: 



orf 115. pep STGHSEQNYTLPREITRNISLGSFAYESHRK 31 

111 I I I I I I I : i I I I : M I I I I I I I [ I I 
orfll5ng NEQTFGEKKVFSENGKLHNYWRARRKGHDETGHREQNYTLPEEITRDISLGSFAYESHSK 71 

orfll5.pep ALSHHAPSQGTELPQSN GISLPYTSNSFTPLPSSSLYIINPVNKGYLVET 81 

I I I : I I M I I I I I I I 1 I I I I I I I I I I I I I I I : I I I I I I I I : I I I | | | | | 

orfll5ng ALSRHAPSQGTELPQSNRDNIRTAKSNGISLPYTPNSFTPLPGSSLYIINPANKGYLVET 131 

orf 115 . pep DPRFANYRQWLGSDYMLDSLKLDPNNLHKRLGDGYYEQRLINEQIAELTGHRRLDGYQND 141 

MM ! I I I I I I I I I I I I I I I I I I I I I I I I | I I | | | | | | | | | [ | | | | | | M I I I 

orfll5ng DPRFANYRQWLGSDYMLGSLKLDPNNLHKRLGDGYYEQRLINEQIAELTGHRRLDGYQND 191 

orf 115 . pep EEQFKALMDNGATAARSMNLSVGIALSAEQVAQLTSDIVWLVQKEVKLPDGGTQTVLVPQ 201 

I I I I I I 1 I I I I I I I I I I I M I I II ! I I I I I : I I I | I | | | | | | | | | ! | || I II I I I I I : I I 
orfll5ng EEQFKALMDNGATAARSMNLSVGIALSAEQAAQLTSDIVWLVQKEVKLPDGGTQTVLMPQ 251 

orf 115. pep VYVRVKNGDIDGKGALLSGSNTQINVSGSLKNSGTIAGRNALIINTDTLDNIGGRIHAQK 261 

M I I I I I I I I II I I I I I I I I I I I II I I I I I I I 1 I I I I I I I I I | | | M | | | | | | | 

orfll5ng VYVRVKNGGIDGKGALLSGSNTQINVSGSLKNSGTIAGRNALIINTDTLDNIGGRIHAQK 311 

orf 115. pep SAVTATQDINNIGGMLSAEQTLLLNAGNNINSQSTTASSQNTQGSSTYLDRMAGIYITGK 321 

I I I I I I I I I I I II I : I I I I I I I I I I I I I I I I : I I I : I I I I : I I I I I I I I I | I I I ! I I I I 
orfll5ng SAVTATQDINNIGGILSAEQTLLLNAGNNINNQSTAKSSQNAQGSSTYLDRMAGIYITGK 371 

orf 11 5. pep EKGV , 9R 
I I I I 

orfll5ng EKGVLAAQAGKDINIIAGQISNQSDQGQTRLQAGRDINLDTVQTGKYQEIHFDADNHTIR 431 

An ORF1 15ng nucleotide sequence <SEQ ID 5 13> was predicted to encode a protein having amino 
acid sequence <SEQ ID 514>: 



101 

151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



MLVQTEKDGL 
LPEEITRDIS 
SLPYTPNSFT 
LKLDPNNLHK 
NGATAARSMN 
QVYVRVKNGG 
DNIGGRIHAQ 
QNAQGSSTYL 
RLQAGRDINL 
SGNNLNAKAA 
GNKLVITDKA 
QAGNHVRIGT 
NEHTGSTVGS 
NQLNSKTTQT 
MPWRLPMQVG 



HNEQTFGEKK 
LGSFAYESHS 
PLPGSSLYII 
RLGDGYYEQR 
LSVGIALSAE 
IDGKGALLSG 
KSAVTATQDI 
DRMAGIYITG 
DTVQTGKYQE 
EVGSAKGTLA 
QSHHETAQSS 
TQTQSQSETY 
LKGDTTIVAS 
YEQKGLTVAF 
RLFKQAKAPK 



VFSENGKLHN 
KALSRHAPSQ 
NPANKGYLVE 
LINEQIAELT 
QAAQLTSDIV 
SNTQINVSGS 
NNIGGILSAE 
KEKGVLAAQA 
IHFDADNHTI 
VYAKNDIT1S 
TFEGKQWLQ 
HQTQKSGLMS 
KHYEQTGSNV 
SSPVTDLAQQ 
K* 



YWRARRKGHD 
GTELPQSNRD 
TDPRFANYRQ 
GHRRLDGYQN 
WLVQKEVKLP 
LKNSGTIAGR 
QTLLLNAGNN 
GKDINIIAGQ 
RGSTNEVGSS 
SGIHAGQVDD 
AGNDANILGS 
AGIGFTIGSK 
SSPEGNNLIS 
A I AVAHKAAK 



ETGHREQNYT 
NIRTAKSNGI 
WLGSDYMLGS 
DEEQFKALMD 
DGGTQTVLMP 
NALIINTDTL 
INNQSTAKSS 
ISNQSDQGQT 
IQTKGDVTLL 
ASKHTGRSGG 
NVISDNGTRI 
TNTQENQSQS 
TQSMDIGAAQ 
QFDKAKTTAL 



Further work revealed the following partial gonococcal DNA sequence <SEQ ID 515>: 
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1 TTGCTTGTGC AAACAGAAAA AGACGGTTTG CATAACGAGC AAACCTTTGG 

51 CGAGAAGAAA GTCTTCAGCG AAAATGGTAA GTTGCACAAC TACTGGCGTG 

101 CGCGTCGTAA AGGACATGAT GAAACAGGGC ATCGTGAACA AAATTATACT 

151 TTGCCGGAGG AAAT CACACG CGACATTTCA CTGGGTTCAT TTGCCTATGA 

201 ATCGCATAGC AAAGCATTAA GCCGTCATGC GCCCAGCCAA GGCACTGAGT 

251 TGCCACAAAG TAACCGGGAT AATATCCGTA CTGCGAAAAG CAACGGTATT 

3 01 TCGCTACCCT ATACGCCCAA TTCTTTTACC CCATTACCCG GCAGCAGCTT 
351 ATACATTATC AATCCTGCCA ATAAAGGCTA TCTTGTTGAA ACCGATCCAC 

4 01 GCTTTGCCAA CTACCGTCAA TGGTTGGGTA GTGACTATAT GCTGGGCAGC 
4 51 CTCAAACTAG ACCCAAACAA TTTACATAAA CGTTTGGGTG ATGGTTATTA 
501 CGAGCAACGT TTAATCAATG AACAAATCGC AGAGCTGACA GGGCATCGTC 
551 GTT TAGACGG TTATCAAAAC GACGAAGAAC AATTTAAAGC CTTAATGGAT 
601 AATGGCGCGA CTGCGGCACG TTCGATGAAT CTCAGCGTTG GCATTGCATT 
651 AAGTGCCGAG CAAGCAGCGC AACTGACCAG CGATATTGTT TGGTTGGTAC 
701 AAAAAGAAGT TAAACTTCCT GATGGCGGCA CACAAACCGT ATTGATGCCA 
751 CAGGTTTATG TACGCGTTAA AAATGGCGGC ATAGACGGTA AAGGTGCATT 
801 GTTGTCAGGC AGCAATACAC AAATCAATGT TTCAGGCAGC CTGAAAAACT 
851 CAGGCACGAT TGCAGGGCGC AATGCGCTTA TTATCAATAC CGATACGCTA 
901 GACAATATCG GTGGGCGTAT TCATGCGCAA AAATCAGCGG TTACGGCCAC 
951 ACAAGACATC AATAATATTG GCGGCATTCT TTCTGCCGAA CAGACATTAT 

1001 TGCTCAATGC GGGTAACAAC ATCAACAACC AAAGCACGGC CAAGAGCAGT 

1051 CAAAATGCAC AAGGTAGCAG CACCTACCTA GACCGAATGG CAGGTATTTA 

1101 TATCACAGGC AAAGAAAAAG GTGTTTTAGC AGCGCAGGCA GGCAAAGACA 

1151 TCAACATCAT TGCCGGTCAA ATCAGCAATC AAT C AG AT C A AGGGCAAACC 

1201 CGGCTGCAGG CAGGACGCGA CATTAACCTG GATACGGTAC AAACCGGCAA 

1251 AT AT CAAGAA ATCCATTTTG ATGCCGATAA CCATACCATC CGAGGTTCAA 

1301 CGAACGAAGT CGGCAGCAGC ATT CAAACAA AAGGCGATGT TACCCtatTG 

1351 T CAGGGAAT A ATCTCAATGC CAAAGCTGCC GAAGT CGGCA GCGCAAAAGG 

1401 CACACTTGCC GTGTATGCTA AAAATGACAT TACTATCAGC TCAGGCATCC 

1451 ATGCCGGCCA AGTTGATGAT GCGTCCAAAC ATACAGGCAG AAGCGGCGGC 

1501 GGTAATAAAT TAGTCATTAC CGATAAAGCC CAAAGT CAT C ACGAAACTGC 

1551 TCAAAGCAGC ACCTTTGAAG GCAAGCAAGT TGTATTGCAG GCAGGAAACG 

1601 ATGCCAACAT CCTTGGCAGT AATGTTATTT CCGATAATGG CACCCGGATT 

1651 CAAGCAGGCA ATCATGTTCG CATTGGTACA ACCCAAACTC AAAGCCAAAG 

1701 CGAAACCTAT CATCAAACCC AAAAATCAGG ATT GATGAGT GCAGGTATCG 

1751 GCTTCACTAT TGGCAGCAAG ACAAACACAC AAGAAAACCA ATCCCAAAGC 

1801 AACGAACATA CAGGCAGTAC CGTAGGCAGC CTGAAAGGCG ATACCACCAT 

1851 TGTTGCAAGC AAACACTACG AACAAACCGG CAGCAACGTT TCCAGCCCTG 

1901 AGGGCAACAA CCTTATCAGC ACGCAAAGTA TGGATATTGG CGCAGCACAA 

1951 AACCAATTAA ACAGCAAAAC CACCCAAACC TACGAACAAA AAGGCTTAAC 

2001 GGTGGCATTC AGTTCGCCCG TTACCGATTT GGCACAACAA GCGATTGCCG 

2051 TAGCACACAA AGCAGCAAAC AAGTCGGACA AAGCAAAAAC GACCGCGTTA 

2101 ATGCCATGGC GGCTGCCAAT GCAGGTTGGC AGGCCTATCA AACAGGCAAA 

2151 GGCGCACAAA ACTTAG 

This corresponds to the amino acid sequence <SEQ ID 516; ORF1 15ng-l>: 

1 LLVQTEKDGL HNEQTFGEKK VFSENGKLHN YWRARRKGHD ETGHREQNYT 

51 LPEEITRDIS LGSFAYESHS KALSRHAPSQ GTELPQSNRD NIRTAKSNGI 

101 SLPYTPNSFT PLPGSSLYII NPANKGYLVE TDPRFANYRQ WLGSDYMLGS 

151 LKLDPNNLHK RLGDGYYEQR LINEQIAELT GHRRLDGYQN DEEQFKALMD 

201 NGATAARSMN LSVGIALSAE QAAQLTSDIV WLVQKEVKLP DGGTQTVLMP 

251 QVYVRVKNGG IDGKGALLSG SNTQINVSGS LKNSGTIAGR NALIINTDTL 

301 DNIGGRIHAQ KSAVTATQDI NNIGGILSAE QTLLLNAGNN INNQSTAKSS 

351 QNAQGSSTYL DRMAGIYITG KEKGVLAAQA GKDINIIAGQ ISNQSDQGQT 

401 RLQAGRDINL DTVQTGKYQE IHFDADNHTI RGSTNEVGSS IQTKGDVTLL 

451 SGNNLNAKAA EVGSAKGTLA VYAKNDITIS SGIHAGQVDD ASKHTGRSGG 

501 GNKLVITDKA QSHHETAQSS TFEGKQWLQ AGNDANILGS NVISDNGTRI 

551 QAGNHVRIGT TQTQSQSETY HQTQKSGLMS AGIGFTIGSK TNTQENQSQS 

601 NEHTGSTVGS LKGDTTIVAS KHYEQTGSNV SSPEGNNLIS TQSMDIGAAQ 

651 NQLNSKTTQT YEQKGLTVAF SSPVTDLAQQ AIAVAHKAAN KSDKAKTTAL 

701 MPWRLPMQVG RPIKQAKAHK T* 

This gonococcal protein (ORF1 15ng-l) shows 91.9% identity with ORF1 15 over 334aa: 

20 30 40 50 60 70 

orf 115ng-l . p NE QT FGEKKVFSENGKLHN YWRARRKGHDETGHREQN YTLPEE ITRDISLGS FA YE SHSK 

Ml I I I I 1 I I : I I ! j : I ] I j I ! M I ! I I 

orf115 STGHSEQNYTLPREITRNISLGSFAYESHRK 

10 20 30 
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80 90 100 110 120 130 

orfll5ng-l.p ALSRHAPSQGTELPQSNRDNIRTAKSNGISLPYTPNSFTPLPGSSLYIINPANKGYLVET 
! ] I : I I I I 1 I I I I I I I I I I I I I I I I I I I I I I : I I I I I ! I I : 1 I M I II I 

or f 115 ALSHHAPSQGTELPQSN GISLPYTSNSFTPLPSSSLYIINPVNKGYLVET 

40 50 60 70 80 

140 150 160 170 180 190 

orfll5ng-l.p DPRFANYRQWLGSDYMLGSLKLDPNNLHKRLGDGYYEQRLINEQIAELTGHRRLDGYQND 
I I I I I I II I II I I II I I II M I I I I I I I I I M I M I I II I II I II I I M I I I M 1 I I I I 

Orfll5 DPRFANYRQWLGSDYMLDSLKLDPNNLHKRLGDGYYEQRLINEQIAELTGHRRLDGYQND 
90 100 110 120 130 140 

200 210 220 230 240 250 

orf 115ng-l . p EEQFKALMDNGATAARSMNLSVG1ALSAEQAAQLTSDIVWLVQKEVKLPDGGTQTVLMPQ 
I I I I I I I I I I II I I I I I I II I I I II I I I I I : I I I II II II I I I I I I I I I I I I I I I! I : I I 
orf 115 EEQFKALMDNGATAARSMNLSVGIALSAEQVAQLTSDIVWLVQKEVKLPDGGTQTVLVPQ 

150 160 170 180 190 200 

260 270 280 290 300 310 

orfll5ng-l.p VYVRVKNGGIDGKGALLSGSNTQINVSGSLKNSGTIAGRNALIINTDTLDNIGGRIHAQK 

I I I I I I I I I I I II II II I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I II I I M I I M 
orf 115 VYVRVKNGDIDGKGALLSGSNTQINVSGSLKNSGTIAGRNALIINTDTLDNIGGRIHAQK 

210 220 230 240 250 260 

320 330 340 350 360 370 

orf 115ng-l . p SAVTATQDINNIGGILSAEQTLLLNAGNNINNQSTAKSSQNAQGSSTYLDRMAGIYITGK 
I! I I I II I I I I I I I : I I I I I I I I I I I M I I I : I II : I I I I : I I I I I I I I I I I II M M I 
orf 115 SAVTATQDINNIGGMLSAEQTLLLNAGNNINSQSTTASSQNTQGSSTYLDRMAGIYITGK 

270 280 290 300 310 320 

380 390 400 410 420 430 

orf 115ng-l . p EKGVLAAQAGKDINIIAGQISNQSDQGQTRLQAGRDINLDTVQTGKYQEIHFDADNHTIR 

II 1 I 

orfll5 EKGV 

In addition, it shows homology with a secreted TV meningitidis protein in the database: 

gi 1 2623258 (AF030941) putative secreted protein [Neisseria meningitidis] Length 
= 2273 

Score = 604 bits (1541), Expect = e-172 

Identities = 325/678 (47%), Positives = 449/678 (65%), Gaps = 22/678 (3%) 

Query: 1 LLVQTEKDGLHNEQTFGEKKVFSENGKLHNYWRARRKGHDETGHREQNYTLPEEITRDIS 60 

L+V T + L N++T G K + ++ G LH Y R +KG D TG+ Y E++ I 
Sbjct: 739 LIVGTPESALDNDETLGTKTI-TDKGDLHRYHRHHKKGRDSTGYSRSPYEPAPEVS-SIR 796 

Query: 61 LGSFAYESHSKALSRHAPSQGTELPQSNRDNIRTAKSNGISLPYTPNSFTPLPGSSLYII 120 

+G AY+ + AP Q +++P +- + NGI +T LP SSL+ I 

Sbjct: 797 MGISAYKGY APQQASDIPGTV VPWAENGIHPTFT LPNSSLFAI 840 

Query: 121 NPANKGYLVETDPRFANYRQWLGSDYMLGSLKLDPNNLHKRLGDGYYEQRLINEQIAELT 180 

P NKGYL+ETDP F +YR+WLGS YML +L+ DPN++HKRLGDGYYEQ+L+NEQIA+LT 
Sbjct: 841 APNNKGYLIETDPAFTDYRKWLGSGYMLAALQQDPNHIHKRLGDGYYEQKLVNEQIAKLT 900 

Query: 181 GHRRLDGYQNDEEQFKALMDNGATAARSMNLSVGIALSAEQAAQLTSDIVWLVQKEVKLP 240 

G+RRLDGY NDEEQFKALMDNG T A+ + L+ GIALSAEQ A+LTSDIVWL + V LP 
Sbjct: 901 GYRRLDGYTNDEEQFKALMDNGITIAKELQLTPGIALSAEQVARLTSDIVWLENETVTLP 960 

Query: 241 DGGTQTVLMPQVYVRVKNGGIDGKGALLSGSNTQINVSGSLKN-SGTIAGRNALIINTDT 299' 

DG TQTVL P+VYVR + ++G+GALLSGS I SG+++N G IAGR ALI+N 
Sbjct: 961 DGTTQTVLKPKVYVRARPKDMNGQGALLSGSWDIG-SGAIENRGGLIAGREALILNAQN 1019 

Query: 300 LDNIGGRIHAQKSAVTATQDIHNIGGILSAEQTLLLNAGNNINNQSTAKSSQNAQGSSTY 359 

+N+G+ + A DI NGI AE LLL A NNI ++S +S+QN QGS 

Sbjct: 1020 IKNLQGDLQGKNIFAAAGSDITNTGSI-GAEHALLLKASNNIESRSETRSNQNEQGSVRN 1078 

Query: 360 LDRMAGIYITGKEKGVLAAQAGKDINIIAGQISNQSDQGQTRLQAGRDINLDTVQTGKYQ 419 

+ R+AGIY+TG++ G + AG +1 + A +++NQS+ GQT L AG DI DT + Q 
Sbjct: 1079 IGRVAGIYLTGRQNGSVLLDAGNNIVLTASELTNQSEDGQTVLNAGGDIRSDTTGISRNQ 1138 
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Query: 420 EIHFDADNHTIRGSTNEVGSSIQTKGDVTLLSGNNLNAKAJVEVGSAKGTLAVYAKNDITI 479 

FD+DN+ IR NEVGS+I+T+G+++L + ++ +AAEVGS +G L + A DI + 
Sbjct: 1139 NTIFDSDNYVIRKEQNEVGSTIRTRGNLSLNAKGDIRIRAAEVGSEQGRLKLAAGRDIKV 1198 

Query: 480 SSGIHAGQVDDASKHTGRSGGGNKLVITDKAQSHHETAQSSTFEGKQWLQAGNDANILG 539 

+G + +DA K+TGRSGGG K +T ++ + AST +GK+++L +G D + G 
Sbjct: 1199 EAGKAHTETEDALKYTGRSGGGIKQKMTRHLKNQNGQAVSGTLDGKEIILVSGRDITVTG 1258 

Query: 540 SNVISDNGTRIQAGNHVRIGTTQTQSQSETYHQTQKSGLM-SAGIGFTIGSKTNTQENQS 5 98 

SN+I+DN T + A N++ + +T+S+S ++ +KSGLM S GIGFT GSK +TQ N+S 
Sbjct: 1259 SNIIADNHTILSAKNNIVLKAAETRSRSAEMNKKEKSGLMGSGGIGFTAGSKKDTQTNRS 1318 

Query: 599 QSNEHTGSTVGSLKGDTTIVASKHYEQTGSNVSSPEGNNLISTQSMDIGAAQNQLNSKTT 658 

++ HT S VGSL G+T I A KHY QTGS +SSP+G+ IS+ + I AAQN+ + ++ 
Sbjct: 1319 ETVSHTESVVGSLNGNTLISAGKHYTQTGSTISSPQGDVGISSGKISIDAAQNRYSQESK 137 8 

Query: 659 QTYEQKGLTVAFS S P VTD 676 

Q YEQKG+TVA S PV + 
Sbjct: 137 9 QVYEQKGVTVAI SVPVVN 1396 

Based on this analysis, it is predicted that the proteins from N.meningitidis and N.gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 62 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 517>: 

1 . . TCAGGGAATA ACCTCAATGC CAAAGCTGCC GAAGTCAGCA GCGCAAACGG 

51 TACACTCGCT GTGTCTGCCA AT AAT GACAT CAACATCAGC GCAGGCATCA 

101 ACACGACCCA TGTTGATGAT GCGTCCAAAC ACACAGGCAG AAGCGGTGGT 

151 GGCAATAAAT TAGTCATTAC CGATAAAGCC CAAAGTCATC ACGAAACCGC 

201 CCAAAGCAGC ACCTTTGAAG GCAAGCAAGT TGTATTGCAG GCAGGAAACG 

251 ATGCCAACAT CCTTGGCAGC AATGTTATTT CCGATAATGG CACCCAGATT 

301 CAAGCAGGCA ATCATGTTCG CATTGGTACA ACCCAAACTC AAAGCCAAAG 

351 CGAAACCTAT CATCAAACCC AGAAATCAGG ATTGATGAGT GCAGGTATCG 

4 01 GCTTCACTAT TGGCAGCAAG ACAAACACAC AAGAAAACCA ATCCCAAAGC 

4 51 AACGAACATA CAGGCAGTAC CGTAGGCAGC TTGAAAGGCG ATACCACCAT 

501 TGTTGCAGGC AAACACTACG AACAAATCGG CAGTACCGTT TCCAGCCCGG 

551 AAGGCAACAA TACCATCTAT GCCCAAAGCA TAGACATTCA AGCGGCACAC 

601 AACAAATTAA ACAGTAATAC CACCCAAACC TATGAACAAA AAGG.CTAAC 

651 GGTGGCATTC AGTTCGCCCG TTACCGATTT GGCACAACAA . . . 

This corresponds to the amino acid sequence <SEQ ID 518; ORF1 17>: 

1 . . SGNNLNAKAA EVSSANGTLA VSANNDINIS AGINTTHVDD ASKHTGRSGG 

51 GNKLVITDKA QSHHETAQSS TFEGKQWLQ AGNDANILGS NVISDNGTQI 

101 QAGNHVRIGT TQTQSQSETY HQTQKSGLMS AGIGFTIGSK TNTQENQSQS 

151 NEHTGSTVGS LKGDTTIVAG KHYEQIGSTV SSPEGNNTIY AQSIDIQAAH 

201 NKLNSNTTQT YEQKXLTVAF SSPVTDLAQQ ... 

Computer analysis of this amino acid sequence gave the following results: 

Homology with the pspA putative secreted protein of N.meningitidis f accession number AF030941) 
ORF1 1 7 and pspA protein show 45% aa identity in 224aa overlap: 

Orfll7: 4 NLNAKAAEVSSANGTLAVSANNDINISAGINTTHVDDASKHTGRSGGGNKLVITDKAQSH 63 

++ +AAEV S G L ++A DI + AG T +DA K+TGRSGGG K +T ++ 
pspA: 1173 D1RIRAAEVGSEQGRLKLAAGRDIKVEAGKAHTETEDALKYTGRSGGGIKQKMTRHLKNQ 1232 

Orf 117 : 64 HETAQSSTFEGKQWLQAGNDANILGSNVISDNGTQIQAGNHVRIGTTQTQSQSETYHQT 123 

+ AST +GK+++L +G D + GSN+I+DN T + A N++ + +T+S+S ++ 
pspA: 1233 NGQAVSGTLDGKEIILVSGRDITVTGSNIIADNHTILSAKNNIVLKAAETRSRSAEMNKK 1292 
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Orfll7: 124 QKSGLM-SAGIGFTIGSKTMTQENQSQSNEHTGSTVGSLKGDTTIVAGKHYEQIGSTVSS 182 

+KSGLM S GIGFT GSK +TQ N+S++ HT S VGSL G+T I AGKHY Q GST+SS 
pspA: 1293 EKSGLMGSGGIGFTAGSKKDTQTNRSETVSHTESWGSLNGNTLISAGKHYTQTGSTISS 1352 

Orfll7: 183 PEGNNTIYAQSIDIQAAHNKLNSNTTQTYEQKXLTVAFSSPVTD 226 

P+G+ 1+ IIAAN++ + Q YEQK +TVA S PV + 
pspA: 1353 PQGDVGIS SGKI S I DAAQNRY SQE SKQVYEQKGVTVAI SVPWN 1396 



Homology with a predicted ORF from N. gonorrhoeae 

ORF117 shows 90% identity over a 230aa overlap with a predicted ORF (ORF117ng) from 
N. gonorrhoeae: 



orfll7.pep SGNNLNAKAAEVSSANGTLAVSANNDINIS 30 

orfll7ng IHFDADNHTIRGSTNEVGSSIQTKGDVTLLSGNNLNAKAAEVGSAKGTLAVYAKNDITIS 4 80 

orf 117 .pep AGINTTHVDDASKHTGRSGGGNKLVITDKAQSHHETAQSSTFEGKQWLQAGNDANILGS 90 

= I I : : : I ! I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I | I M i ! I I I I I I II I 

orfll7ng SGIHAGQVDDASKHTGRSGGGNKLVITDPCAQSHHETAQSSTFEGKQVVLQAGNDANILGS 54 0 

orf 117 .pep NVISDNGTQIQAGNHVRIGTTQTQSQSETYHQTQKSGLMSAGIGFTIGSKTNTQENQSQS 150 

I M I M M : I I I I I I I I I I I I I I I I I I I I I I | I I | | | | | | | | | | | | | | | | | | | | | | j | | | 

orfll7ng NVISDNGTRIQAGNHVRIGTTQTQSQSETYHQTQKSGLMSAGIGFTIGSKTNTQENQSQS 600 

orf 117 .pep NEHT GSTVGS LKGDTT IVAGKHYEQIGS TVS S PEGNNT I YAQS I D IQAAHNKLNSNTTQT 210 

M I I I I M M I I I I I II I I : I I I I I I I : I I | | 1 | || | : | | : | | I I : I : I | | : I I ! I 

orfll7ng NEHTGSTVGSLKGDTTIVASKHYEQTGSNVSSPEGNNLISTQSMDIGAAQNQLNSKTTQT 660 

orf 117. pep YEQKXLTVAFS S PVTDLAQQ 230 
I I I I I I I I I I I I I I I I I I I 

orfll7ng YE QKGLTVAFS S PVTDLAQQAI AVAHKAAKQFDKAKTT ALMP WRL PMQVGRL FKQAKAPK 720 

An ORF1 17ng nucleotide sequence <SEQ ID 5 19> was predicted to encode a protein having amino 
acid sequence <SEQ ID 520>: 



1 . . LLVQTEKDGL HNEQTFGEKK VFSENGKLHN YWRARRKGHD ETGHREQNYT 

51 LPEEITRDIS LGSFAYESHS KALSRHAPSQ GTELPQSNRD NIRTAKSNGI 

101 SLPYTPNSFT PLPGSSLYII NPANKGYLVE TDPRFANYRQ WLGSDYMLGS 

151 LKLDPNNLHK RLGDGYYEQR LINEQIAELT GHRRLDGYQN DEEQFKALMD 

2 01 NGATAARSMN LSVGIALSAE QAAQLTSDIV WLVQKEVKLP DGGTQTVLMP 

251 QVYVRVKNGG IDGKGALLSG SNTQINVSGS LKNSGTIAGR NALIINTDTL 

301 DNIGGRIHAQ KSAVTATQDI NNIGGILSAE QTLLLNAGNN INNQSTAKSS 

351 QNAQGSSTYL DRMAGIYITG KEKGVLAAQA GKDINIIAGQ ISNQSDQGQT 

4 01 RLQAGRDINL DTVQTGKYQE IHFDADNHTI RGSTNEVGSS IQTKGDVTLL 
451 SGNNLNAKAA EVGSAKGTLA VYAKNDITIS SGIHAGQVDD ASKHTGRSGG 

5 01 GNKLVITDKA QSHHETAQSS TFEGKQWLQ AGNDANILGS NVISDNGTRI 
551 QAGNHVRIGT TQTQSQSETY HQTQKSGLMS AG1GFTIGSK TNTQENQSQS 
601 NEHTGSTVGS LKGDTT I VAS KHYEQTGSNV SSPEGNNLIS TQSMDIGAAQ 
651 NQLNSKTTQT YEQKGLTVAF SSPVTDLAQQ A I AVAHKAAK QFDKAKTTAL 
701 MPWRLPMQVG RL FKQAKAPK K* 

Further work revealed the following gonococcal partial DNA sequence <SEQ ID 521>: 

1 TTGCTTGTGC AAACAGAAAA AGACGGTTTG CATAACGAGC AAACCTTTGG 

51 CGAGAAGAAA GTCTTCAGCG AAAATGGTAA GTTGCACAAC TACTGGCGTG 

101 CGCGTCGTAA AGGACATGAT GAAACAGGGC ATCGTGAACA AAATTATACT 

151 TTGCCGGAGG AAATCACACG CGACATTTCA CTGGGTTCAT TTGCCTATGA 

201 ATCGCATAGC AAAGCATTAA GCCGTCATGC GCCCAGCCAA GGCACTGAGT 

251 TGCCACAAAG TAACCGGGAT AATATCCGTA CTGCGAAAAG CAACGGTATT 

301 TCGCTACCCT ATACGCCCAA TTCTTTTACC CCATTACCCG GCAGCAGCTT 

351 ATACATTATC AATCCTGCCA AT7AAAGGCTA TCTTGTTGAA ACCGATCCAC 

4 01 GCTTTGCCAA CTACCGTCAA TGGTTGGGTA GTGACTATAT GCTGGGCAGC 

451 CTCAAACTAG ACCCAAACAA T T TACAT AAA CGTTTGGGTG ATGGTTATTA 

501 CGAGCAACGT TTAATCAATG AACAAATCGC AGAGCTGACA GGGCATCGTC 

551 GTTTAGACGG TTATCAAAAC GACGAAGAAC AATTTAAAGC CTTAATGGAT 
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601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 



AATGGCGCGA 
AAGTGCCGAG 
AAAAAGAAGT 
CAGGTTTATG 
GTTGTCAGGC 
CAGGCACGAT 
GACAATATCG 
ACAAGACATC 
TGCTCAATGC 
CAAAATGCAC 
TATCACAGGC 
TCAACATCAT 
CGGCTGCAGG 
AT AT CAAGAA 
CGAACGAAGT 
TCAGGGAATA 
CACACTTGCC 
ATGCCGGCCA 
GGTAATAAAT 
TCAAAGCAGC 
ATGCCAACAT 
CAAGCAGGCA 
CGAAACCTAT 
GCTTCACTAT 
AACGAACATA 
TGTTGCAAGC 
AGGGCAACAA 
AACCAATTAA 
GGTGGCATTC 
TAGCACACAA 
ATGCCATGGC 
GGCGCACAAA 



CTGCGGCACG 
CAAGCAGCGC 
TAAACTTCCT 
TACGCGTTAA 
AGCAATACAC 
TGCAGGGCGC 
GTGGGCGTAT 
AATAATATTG 
GGGTAACAAC 
AAGGTAGCAG 
AAAGAAAAAG 
TGCCGGTCAA 
CAGGACGCGA 
ATCCATTTTG 
CGGCAGCAGC 
ATCTCAATGC 
GTGTATGCTA 
AGTTGATGAT 
TAGTCATTAC 
ACCTTTGAAG 
CCTTGGCAGT 
ATCATGTTCG 
CATCAAACCC 
TGGCAGCAAG 
CAGGCAGTAC 
AAACACTACG 
CCTTATCAGC 
AC AG CAAAAC 
AGTTCGCCCG 
AGCAGCAAAC 
GGCTGCCAAT 
ACT TAG 



TTCGATGAAT 
AACTGACCAG 
GATGGCGGCA 
AAATGGCGGC 
AAATCAATGT 
AATGCGCTTA 
TCATGCGCAA 
GCGGCATTCT 
ATCAACAACC 
CACCTACCTA 
GTGTTTTAGC 
AT CAGCAAT C 
CATTAACCTG 
ATGCCGATAA 
ATTCAAACAA 
CAAAGCTGCC 
AAAATGACAT 
GCGTCCAAAC 
CGATAAAGCC 
GCAAGCAAGT 
AATGTTATTT 
CATTGGTACA 
AAAAATCAGG 
ACAAACACAC 
CGTAGGCAGC 
AACAAACCGG 
ACGCAAAGTA 
CACCCAAACC 
TTACCGATTT 
AAGTCGGACA 
GCAGGTTGGC 



CTCAGCGTTG 
CGATATTGTT 
CACAAACCGT 
ATAGACGGTA 
TTCAGGCAGC 
TTATCAATAC 
AAATCAGCGG 
TTCTGCCGAA 
AAAGCACGGC 
GACCGAATGG 
AGCGCAGGCA 
AATCAGATCA 
GATACGGTAC 
CCATACCATC 
AAGGCGATGT 
GAAGTCGGCA 
TACTATCAGC 
ATACAGGCAG 
C AAAGT CAT C 
TGTATTGCAG 
CCGATAATGG 
ACCCAAACTC 
ATTGATGAGT 
AAGAAAACCA 
CTGAAAGGCG 
CAGCAACGTT 
TGGATATTGG 
TACGAACAAA 
GGCACAACAA 
AAGCAAAAAC 
AGGCCTATCA 



GCATTGCATT 
TGGTTGGTAC 
ATTGATGCCA 
AAGGTGCATT 
CTGAAAAACT 
CGATACGCTA 
TTACGGCCAC 
CAGACATTAT 
CAAGAGCAGT 
CAGGT ATT T A 
GGCAAAGACA 
AGGGCAAACC 
AAACCGGCAA 
CGAGGTTCAA 
TACCCtatTG 
GCGCAAAAGG 
TCAGGCATCC 
AAGCGGCGGC 
ACGAAACTGC 
GCAGGAAACG 
CACCCGGATT 
AAAGCCAAAG 
GCAGGTATCG 
ATCCCAAAGC 
ATACCACCAT 
TCCAGCCCTG 
CGCAGCACAA 
AAGGCTTAAC 
GCGATTGCCG 
GACCGCGTTA 
AACAGGCAAA 



This corresponds to the amino acid sequence <SEQ ID 522; ORF1 17ng-l>: 



151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



LLVQTEKDGL 
LPEEITRDIS 
SLPYT PNSFT 
LKLDPNNLHK 
NGATAARSMN 
QVYVRVKNGG 
DNIGGRIHAQ 
QNAQGSSTYL 
RLQAGRDINL 
SGNNLNAKAA 
GNKLVITDKA 
QAGNHVRIGT 
NEHTGSTVGS 
NQLNSKTTQT 
MPWRLPMQVG 



HNEQTFGEKK 
LGSFAYESHS 
PLPGSSLYII 
RLGDGYYEQR 
LSVGIALSAE 
IDGKGALLSG 
KSAVTATQDI 
DRMAGIYITG 
DTVQTGKYQE 
EVGSAKGTLA 
QSHHETAQSS 
TQTQSQSETY 
LKGDTTIVAS 
YEQKGLTVAF 
RPIKQAKAHK 



VFSENGKLHN 
KALSRHAPSQ 
NPANKGYLVE 
LINEQIAELT 
QAAQLTSDIV 
SNTQINVSGS 
NNIGGILSAE 
KEKGVLAAQA 
IHFDADNHTI 
VYAKNDITIS 
TFEGKQWLQ 
HQTQKSGLMS 
KHYEQTGSNV 
SSPVTDLAQQ 



YWRARRKGHD 
GTELPQSNRD 
TDPRFANYRQ 
GHRRLDGYQN 
WLVQKEVKLP 
LKNSGTIAGR 
QTLLLNAGNN 
GKDINIIAGQ 
RGSTNEVGSS 
SGIHAGQVDD 
AGNDANILGS 
AGIGFTIGSK 
SSPEGNNLIS 
AIAVAHKAAN 



ETGHREQNYT 
NIRTAKSNGI 
WLGSDYMLGS 
DEEQFKALMD 
DGGTQTVLMP 
NALIINTDTL 
INNQSTAKSS 
ISNQSDQGQT 
IQTKGDVTLL 
ASKHTGRSGG 
NVISDNGTRI 
TNTQENQSQS 
TQSMDIGAAQ 
KSDKAKTTAL 



ORF117ng-l shows the same 90% identity over a 230aa overlap with ORF117. In addition, it 
shows homology with a secreted N. meningitidis protein in the database: 



gi I 2623258 (AF030941) putative seer 
2273 

Score = 604 bits (1541), Expect = 
Identities = 325/678 (47%), Positi 



ited prote: 
e-172 



ngitidis] Length = 



449/678 (65%), Gaps = 22/678 (3%) 



Query: 1 LL VQTEKDGLHNEQT FGEKKVFSENGKLHNYWRARRKGHDE TGHREQN YTLPEE I TR D I S 60 

L+V T + L N++T G K + ++ G LH Y R +KG D TG+ Y E++ I 
Sbjct: 739 LIVGTPESALDNDETLGTKTI-TDKGDLHRYHRHHKKGRDSTGYSRSPYEPAPEVS-SIR 7 96 

Query: 61 LGSFAYESHSKALSRHAPSQGTELPQSNRDNIRTAKSNGISLPYTPNSFTPLPGSSLYII 120 

+G AY+ + AP Q +++P + + NGI +T LP SSL+ I 

Sbjct : 797 MGISAYKGY APQQASDIPGTV VPWAENGIHPTFT LPNSSLFAI 840 

Query: 121 NPANKGYLVETDPRFANYRQWLGSDYMLGSLKLDPNNLHKRLGDGYYEQRLINEQIAELT 180 
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Sbjct: 


841 


Query: 


181 


Sbjct: 


901 


Query: 


241 


Sbjct: 


961 


Query: 


300 


Sbjct : 


1020 


Query: 


360 


Sbjct: 


1079 


Query: 


420 


Sb j ct : 


1139 


Query: 


480 


Sbjct: 


1199 


Query: 


540 


Sbjct: 


1259 




599 


Sbjct : 


1319 


Query: 


659 


Sbjct: 


1379 



P NKGYL+ETDP F +YR+WLGS YML +L+ DPN++HKRLGDGYYEQ+L+NEQIA+LT 



GHRRLDGYQNDEEQFKALMDNGATAARSMNLSVGIALSAEQAAQLTSDIVWLVQKEVKLP 240 
G+RRLDGY NDEEQFKALMDNG T A+ + L+ GIALSAEQ A+LTSDIVWL + V LP 
GYRRLDGYTNDEEQFKALMDNGITIAKELQLTPGIALSAEQVARLTSDIVWLENETVTLP 960 

DGGTQTVLMPQVYVRVKNGGIDGKGALLSGSNTQINVSGSLKN-SGTIAGRNALIINTDT 299 
DG TQTVL P+VYVR + ++G+GALLSGS I SG+++N G IAGR ALI+N 
DGTTQTVLKPKVYVRARPKDMNGQGALLSGSWDIG-SGAIENRGGLIAGREALILNAQN 1019 



LDRMAGIYITGKEKGVLAAQAGKDINIIAGQISNQSDQGQTRLQAGRDINLDTVQTGKYQ 419 



EIHFDADNHTIRGSTNEVGSSIQTKGDVTLLSGNNLNAKAAEVGSAKGTLAVYAKNDITI 47 9 



SSGIHAGQVDDASKHTGRSGGGNKLVITDKAQSHHETAQSSTFEGKQWLQAGNDANILG 53 9 
+G + +DA K+TGRSGGG K +T ++ + AST +GK+++L +G D + G 



SNVISDNGTRIQAGNHVRIGTTQTQSQSETYHQTQKSGLM- SAGIGFTIGSKTNTQENQS 598 
SN+I+DN T + A N++ + +T+S+S ++ +KSGLM S GIGFT GSK +TQ N+S 



QSNEHTGSTVGSLKGDTTIVASKHYEQTGSNVSSPEGNNLISTQSMDIGAAQNQLNSKTT 658 
++ HT S VGSL G+T I A KHY QTGS +SSP+G+ IS+ + I AAQN+ + ++ 
ETVSHTESVVGSLNGNTLISAGKHYTQTGSTISSPQGDVGISSGKISIDAAQNRYSQESK 137 8 

QTYEQKGLTVAFS S PVTD 67 6 
Q YEQKG+TVA S PV + 



Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
40 their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 63 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 523>: 

1 AT GAT T T ACA TCGTACTGTT TCTAGCTGTC GTCCTCGCCG TTGTCGCCTA 

51 CAACATGTAT CAGGAAAACC AATACCGCAA AAAAGTGCGC GACCAGTTCG 

101 GACACT CCGA CAAAGAT GCC CTGCTCAACA GCAwAACCAG CCATGTCCGC 

151 GACGGCAAAC CGTCCGGCGG GTCAGTCATG ATGCCGAAAC CCCAACCGGC 

2 01 GGTCAAAAAA ACGGCAAAAC CCCAAGACCC CGyCATGCGC AACCTGCAAG 

251 AACAGGATGC CGTCTACATC GCCAAGCAGA AACAGGCAAA AGCCTCCCCG 

301 TTCAAAACCG AAATCGAAAC CGCCTTGGAA GAAAGCGGCA TTATCGGCAA 

351 CTCCGCCCAC ACCGTTTCCG AACCCCAAAC CGGACATTCC GCAACGAAAC 

401 CTGCCGACGC GTCGGCAAAA CCTGCACCCG TTCCGCAAAC ACCTGCAAAA 

451 CCGCTGATTA CGCTCAAAGA ACTGTCAAAA GTCGAATTAT CCTGGTTTGA 

501 CGTGCGCATC GACTTCATCT CCTAT . . . 

This corresponds to the amino acid sequence <SEQ ID 524; ORF1 19>: 

1 MIYIVLFLAV VLAWAYNMY QENQYRKKVR DQFGHSDKDA LLNSXTSHVR 

51 DGKPSGGSVM MPKPQPAVKK TAKPQDPXMR NLQEQDAVYI AKQKQAKASP 

101 FKTEIETALE ESGIIGNSAH TVSEPQTGHS ATKPADASAK PAPVPQTPAK 

151 PLITLKELSK VELSWFDVRI DFISY. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 525>: 
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101 

151 

201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 



AT GAT TT AC A 
CAACATGTAT 
GACACTCCGA 
GACGGCAAAC 
GGTCAAAAAA 
AACAGGATGC 
TTCAAAACCG 
CTCCGCCCAC 
CTGCCGACGC 
CCGCTGATTA 
CGTGCGCTTC 
TGCACGCACT 
TGCACCATGG 
CTATCAGGCA 
CCTCGCAGGA 
CAAAGCATGG 
AGTGGCTTCC 
CCATCCATTT 
GCCGTAACGG 
TACCGACACG 
AGCCGTTTAC 
ATGCTGCTCG 
TTTGTTTATG 
TGGTCAACGA 
CGCACTTATG 
ACCGGGCGGC 



TCGTACTGTT 
CAGGAAAACC 
CAAAGATGCC 
CGTCCGGCGG 
ACGGCAAAAC 
CGTCTACATC 
AAATCGAAAC 
ACCGTTTCCG 
GCCGGCAAAA 
CGCTCAAAGA 
GACTTCATCT 
GCCGCGCCTT 
ACGACCATTT 
TTTATCGTGG 
AGAACTCTCC 
GCGGTCAGAC 
GCACTGGACG 
GGTTTCCCCG 
GCGTGGGTTT 
TCGGGCTCGA 
CAACGCCCTT 
ACATCCCGCA 
GATTTGGCGG 
CAAAATGGAA 
TATTGGCGCG 
AAAACCGCAT 



TCTAGCTGTC 
AATACCGCAA 
CTGCTCAACA 
GTCAGTCATG 
CCCAAGACCC 
GCCAAGCAGA 
CGCCTTGGAA 
AACCCCAAAC 
CCTGCACCCG 
ACTGTCAAAA 
CCTATATCGC 
TCCAACCGCT 
CCAGATTGCC 
GTATTCAGGC 
GCATTCAACC 
GCTGCACACC 
CATTCTGCGC 
ACCAGCATCA 
CGTTTTGGAA 
CCATGTTCTC 
TTGGACAACC 
CTCTCCGGCA 
TACGCCTGTC 
GAAGTTTCGA 
TCAGTCCGAG 
TGCGCCTGTT 



GTCCTCGCCG 
AAAAGTGCGC 
GCAAAACCAG 
ATGCCGAAAC 
CGCCATGCGC 
AACAGGCAAA 
GAAAGCGGCA 
CGGACATTCC 
TTCCGCAAAC 
GTCGAATTAC 
GCTGACCGAA 
GCCGCTACCA 
GAACCCATCC 
AGTCAGCCGC 
GCCAGGTGGA 
GACCTTGCCG 
GCGCGTCGAC 
GCGGCGTAGA 
GACGACGGCG 
CATCTGCTCG 
AGTCCTACAA 
GGCGAAAAAA 
CGGCCAGTTG 
CCCAATGGCT 
ATGCTCAAAG 
CTCCTAA 



TTGTCGCCTA 
GACCAGTTCG 
CCATGTCCGC 
CCCAACCGGC 
AACCTGCAAG 
AGCCTCCCCG 
TTATCGGCAA 
GCACCGAAAC 
ACCTGCAAAA 
CCTGGTTTGA 
GCCAAAGAAC 
GATTGTCGGC 
CGGGCATCCG 
AACGGACTTG 
CGCATTCGCA 
CCTTTATCGA 
CAGACCATCG 
ACTGCGTTCC 
CGTTCCACTA 
CTCAACAACG 
AGGCTTCAGT 
CCTTCGACGA 
AACCTGAATC 
CAAAGACGTG 
TCGGTATCGA 



This corresponds to the amino acid sequence <SEQ ID 526; ORF119-l>: 

1 MIYIVLFLAV VLAWA YNMY QENQYRKKVR DQFGHSDKDA LLNSKTSHVR 

51 DGKPSGGSVM MPKPQPAVKK TAKPQDPAMR NLQEQDAVYI AKQKQAKASP 

101 FKTEIETALE ESGIIGNSAH TVSEPQTGHS APKPADAPAK PAPVPQTPAK 

151 PLITLKELSK VELPWFDVRF DFISYIALTE AKELHALPRL SNRCRYQIVG 

201 CTMDDHFQIA EPIPGIRYQA FIVGIQAVSR NGLASQEELS AFNRQVDAFA 

251 QSMGGQTLHT DLAAFIEVAS ALDAFCARVD QTIAIHLVSP TSISGVELRS 

301 AVTGVGFVLE DDGAFHYTDT SGSTMFSICS LNNEPFTNAL LDNQSYKGFS 

351 MLLDIPHSPA GEKTFDDLFM DLAVRLSGQL NLNLVNDKME EVSTQWLKDV 

401 RTYVLARQSE MLKVGIEPGG KTALRLFS* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORP from N.meningitidis (strain A) 

ORF1 19 shows 93.7% identity over a 175aa overlap with an ORF (ORF1 19a) from strain A of N. 
meningitidis: 



MIYIVLFLAWLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSXTSHVRDGKPSGGSVM 

I : I I I I I I I I I I M I I I I I I M I I I I I I 1 | | | | | | | | | | | | | | | | | || || 

IIYIVLFLAAVLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGPVM 
20 30 40 50 60 



10 



70 



90 



100 110 120 

MPKPQPAVKKTAKPQDPXMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 
M I I I I I I I I I I I III I I f I I I I I I I M I I I I I I I I I I I | | | | I I I I I I I I I I I I I | I 
MPKPQPAVKKTAKSQDPAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 
70 80 90 100 110 120 

130 140 150 160 170 

TVSEPQTGHSATKPADASAKPAPVPQTPAKPLITLKELSKVELSWFDVRIDFISY 

11 1 I N I I III: | | | | | | | 1 | | | | | | || Mllltlllll 

TVPEPQTGHSAPKPADAPAKPVPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 

130 140 150 160 170 180 

AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 
19 ° 200 210 220 230 240 



The complete length ORF 1 19a nucleotide sequence <SEQ ID 527> is: 
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1 ATGATTTACA TCGTACTGTT CCTCGCCGCC GTCCTCGCCG TTGTCGCCTA 

51 CAATATGTAT CAGGAAAACC AATACCGCAA AAAAGTGCGC GACCAGTTCG 

101 GGCACTCCGA CAAAGATGCC CTGCTCAACA GCAAAACCAG CCATGTCCGC 

151 GACGGCAAAC CGTCCGGCGG GCCAGTCATG ATGCCGAAAC CCCAACCGGC 

201 GGTCAAAAAA ACGGCAAAAT CCCAAGACCC CGCCATGCGC AACCTGCAAG 

251 AGCAGGATGC CGTCTACATC GCCAAGCAGA AACAGGCAAA AGCCTCCCCG 

301 TTCAAAACCG AAAT CGAAAC CGCCTTGGAA GAAAGCGGCA TTATCGGCAA 

351 CTCCGCCCAC ACCGTTCCCG AACCCCAAAC CGGACATTCC GCACCAAAAC 

401 CTGCCGACGC GCCGGCAAAA CCTGTTCCCG TTCCGCAAAC GCCGGCAAAA 

451 CCGCTGATTA CGCTCAAAGA GCTGTCGAAG GTCGAGCTGC CCTGGTTTGA 

501 CGTGCGCTTC GACTTCATCT CTTATATCGC GCTGACCGAA GCCAAAGAAC 

551 TGCACGCACT GCCGCGCCTT TCCAACCGCT GCCGCTACCA GATTGTCGGC 

601 TGCACCATGG ACGACCATTT CCAGATTGCC GAACCCATCC CGGGCATCCG 

651 CTATCAGGCA TTTATCGTGG GTATTCAGGC AGTCAGCCGC AACGGACTTG 

701 CCTCGCAGGA AGAACTCTCC GCATTCAACC GCCAGGTGGA TGCATTCGCA 

751 CACAGCATGG GCGGTCAGAC GCTGCACACC GACCTTGCCG CCTTTATCGA 

801 AGTGGCTTCC GCACTGGACG CATTCTGCGC GCGCGTCGAC CAGACTATCG 

851 CCATCCATTT GGTTTCCCCG ACCAGCATCA GCGGCGTAGA ACTGCGTTCC 

901 GCCGTAACGG GCGTGGGTTT CGTTTTGGAA GACGACGGCG CGTTCCACTA 

951 TACCGACACG TCGGGCTCGA CCATGTTCTC CATCTGCTCG CTCAACAACG 

1001 AGCCGTTTAC CAATGCCCTT TTGGACAACC AGTCCTATAA AGGCTTCAGT 

1051 ATGCTGCTCG ACATCCCGCA CTCTCCGGCA GGCGAAAAAA CCTTCGACGA 

1101 TTTGTTTATG GATTTGGCGG TACGCCTGTC CGGCCAGTTG AACCTGAATC 

1151 TGGTCAACGA CAAAATGGAA GAAGTTTCGA CCCAATGGCT CAAAGACGTG 

1201 CGCACTTATG TATTGGCTCG TCAGTCCGAG ATGCTCAAAG TCGGTATCGA 

1251 ACCGGGCGGC AAAAC C G CAT TGCGCCTGTT CTCCTAA 

This encodes a protein having amino acid sequence <SEQ ED 528>: 



1 MIYIVLFLAA VLAWA YNMY QENQYRKKVR DQFGHSDKDA LLNSKTSHVR 

51 DGKPSGGPVM MPKPQPAVKK TAKSQDPAMR NLQEQDAVYI AKQKQAKASP 

101 FKTEIETALE ESGIIGNSAH TVPEPQTGHS APKPADAPAK PVPVPQTPAK 

151 PLITLKELSK VELPWFDVRF DFISYIALTE AKELHALPRL SNRCRYQIVG 

2 01 CTMDDHFQIA EPIPGIRYQA FIVGIQAVSR NGLASQEELS AFNRQVDAFA 

2 51 HSMGGQTLHT DLAAFIEVAS ALDAFCARVD QTIAIHLVSP TSISGVELRS 

301 AVTGVGFVLE DDGAFHYTDT SGSTMFSICS LNNEPFTNAL LDNQSYKGFS 

351 MLLDIPHSPA GEKTFDDLFM DLAVRLSGQL NLNLVNDKME EVSTQWLKDV 

401 RTYVLARQSE MLKVGIEPGG KTALRLFS* 

ORF1 1 9a and ORF1 19-1 show 98.6% identity in 428 aa overlap: 

10 20 30 40 50 60 

orf 11 9a . pep MIYIVLFLAAVLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGPVM 

I 1 I I 1 I I I I : I I M I I I I II I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I [ M I 
orf 11 9-1 MI YI VL FLAWLAVVAYNMYQENQYRKKVRDQFGHS DKDAL LN S KT S HVRDGKP S GGS VM 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 1 19a . pep MPKPQPAVKKTAKSQDPAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 

I I I I I I I I I M I I I I I II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 

orf 11 9-1 MPKPQPAVKKTAKPQDPAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEESGIIGNSAH 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 11 9a. pep TVPEPQTGHSAPKPADAPAKPVPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 

II I I M I I I M N I I I I I I I : I I I I I I I I I I f i I ! I I I I I I I I I I I I I I I ] I I I I I | | | 
orf 11 9-1 TVSEPQTGHSAPKPADAPAKPAPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 119a. pep AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVG1QAVSRNGLASQEELS 

I I I M I I I I I I I I I II I I I I I I I II I I I I I I I 1 | | | | | | | | | || | | | | | | | [| | 

orf 119-1 AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 11 9a. pep AFNRQVDAFAHSMGGQTLHTDLAAFIEVASALDAFCARVDQTIAIHLVSPTSISGVELRS 

„,„ , I I I I I I I I I I : I M I I I I I I II I 

orf 119-1 AFNRQVDAFAQSMGGQTLHTDLAAFIEVASALDAFCARVDQTIAIHLVSPTSISGVELRS 

250 260 270 280 290 300 
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310 320 330 340 350 360 

orf 119a. pep AVTGVGFVLEDDGAFHYTDTSGSTMFSICSLNNEPFTNALLDNQSYKGFSMLLDIPHSPA 

I II M II II I I II I II I I I I I I I i I ! ! I I I II I I I M I I I I 1 I I I I I I 1 I M I I I I I M I 

orf 119-1 AVTGVGFVLEDDGAFHYTDTSGSTMFSICSLNNEPFTNALLDNQSYKGFSMLLDIPHSPA 
310 320 330 340 350 360 



370 380 390 400 410 420 

orf 119a . pep GEKTFDDLFMDLAVRLSGQLNLNLVNDKMEEVSTQWLKDVRTYVLARQSEMLKVGIEPGG 

M I I II I I I I M I I I M M M M 11 I I I I I I I II I I II I I I I I I I I I I I M M I I I I I I I 

orf 119-1 GEKTFDDLFMDLAVRLSGQLNLNLVNDKMEEVSTQWLKDVRTYVLARQSEMLKVGIEPGG 
370 380 390 400 410 420 



429 

orf 11 9a. pep KTALRLFSX 
I I I II M II 
orfll9-l KTALRLFSX 



Homology with a predicted ORF from N.gonorrhoeae 

ORF119 shows 93.1% identity over a 175aa overlap with a predicted ORF (ORF119ng) from 
N.gonorrhoeae: 



orf 11 9. pep MIYIVLFLAWLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSXTSHVRDGKPSGGSVM 
I I I I I I I I I : I II II I I I I I I I I I I I I I I 1 II I I I I I 1 I I I I I I 11 II 

orfll9ng MIYIVLFLAAVLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGPVM 

orf 11 9. pep MPKPQPAVKKTAKPQDPXMRNLQEQDAVYIAKQKQABCASPFKTEIETALEESGIIGNSAH 
I I I I I I I I I I I I I I I I I I I I I I M I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 119ng MPKPQPAVKKPAKPQDSAMRNLQEQDAVYIAKQKQAKASPFKTEIETALEEIGIIGNSAH 

orf 119 .pep TVSEPQTGHSATKPADASAKPAPVPQTPAKPLITLKELSKVELSWFDVRIDFISY 
M I I I M I M I I I I II II I : I I II I I II I ! II II II II II I II I I I : I I I I i 
orfll9ng TVSEPQTGHSAPKPADAPAKPVPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 

The complete length ORF119ng nucleotide sequence <SEQ ID 529> is: 

1 ATGATTTACA TCGTACTGTT CCTCGCCGCC GTCCTCGCCG TTGTCGCCTA 

51 CAATATGTAT CAGGAAAACC AATACCGCAA AAAAGTGCGC GACCAGTTCG 

101 GACACTCCGA CAAAGATGCC CTGCTCAACA GCAAAACCAG CCATGTCCGC 

151 GACGGCAAAC CGTCCGGCGG GCCAGTCATG ATGCCGAAAC CCCAACCGGC 

201 GGTCAAAAAA CCGGCCAAAC CCCAAGACTC CGCCATGCGC AACCTGCAAG 

251 AACAGGATGC CGTCTACATC GCCAAGCAGA AACAGGCAAA AGCCTCCCCG 

301 TTCAAAACCG AAATCGAAAC CGCCTTGGAA GAAATCGGCA TTATCGGCAA 

351 CTCCGCCCAC ACCGTTTCCG AACCCCAAAC CGGACATTCC GCACCGAAAC 

4 01 CTGCCGACGC GCCGGCAAAA CCCGTTCCCG TTCCGCAAAC GCCGGCAAAA 

451 CCGCTGATTA CGCTCAAAGA GCTGTCGAAG GTCGAGCTGC CCTGGTTTGA 

501 CGTGCGCTtc gACTTCATCT CCTATATCGC GCTGACCGAA GCCAAAGAAC 

551 TGCACGCACT GCCGCGCCTT tccAACCGCT GCCGCTACCA GATTGTCGGC 

601 TGCACCATGG ACGACCATTT CCAGATTGCC GAACCCATCC CGGGCATCCG 

651 CTATCAGGCA TTTATCGTGG GTATCCAGGC AGTCAGCCGC AACGGACTTG 

7 01 CCTCGCAGGA AGAACTCTCC GCATTCAACC GCCAGGCGGA CGCATTCGCA 

7 51 CAAAGCATGG GCGGTCAGAC GCTGCACACC GACCTTGCCG CCTTTATCGA 

801 AGTGGCTTCC GCACTGGACG CATTCTGCGC GCGCGTCGAC CAGACCATCG 

851 CCATCCATTT GGTTTCGCCG AC C AG CAT C A GCGGCGTAGA ACTGCGTTCC 

901 GCCGTAACGG GCGTGGGTTT CGTTTTGGAA GACGACGGCG CGTTCCACTA 

951 TACCGACACG TCGGGCTCGA CCATGTTCTC CATCTGCTCG CTCAACAACG 

1001 AGCCGTTTAC CAATGCCCTT TTGGACAACC AGTCCTACAA AGGCTTCAGT 

1051 ATGCTGCTCG ACATCCCGCA CTCTCCGGCA GGCGAAAAAA CCTTCGACGA 

1101 TTTGTTTATG GATTTGGCGG TACGCCTGTC CGGTCAGTTG AACCTGAATC 

1151 TGGTCAACGA CAAAATGGAA GAAGTTTCGA CCCAATGGCT CAAAGACGTA 

1201 CGCACTTATG TATTGGCGCG TCAGTCCGAG ATGCTCAAAG TCGGTATCGA 

1251 ACCGGGCGGC AAAACCGCCC TGCGCCTGTT TTCATAA 

This encodes a protein having amino acid sequence <SEQ ID 530>: 

1 MIYIVLFLAA VLAVVA YNMY QENQYRKKVR DQFGHSDKDA LLNSKTSHVR 
51 DGKPSGGPVM MPKPQPAVKK PAKPQDSAMR NLQEQDAVYI AKQKQAKASP 
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101 FKTEIETALE EIGIIGNSAH TVSEPQTGHS APKPADAPAK PVPVPQTPAK 

151 PLITLKELSK VELPWFDVRF DFISYIALTE AKELHALPRL SNRCRYQIVG 

201 CTMDDHFQIA EPIPGIRYQA FIVGIQAVSR NGLASQEELS AFNRQADAFA 

251 QSMGGQTLHT DLAAFIEVAS ALDAFCARVD QTIAIHLVSP TSISGVELRS 

301 AVTGVGFVLE DDGAFHYTDT SGSTMFSICS LNNEPFTNAL LDNQSYKGFS 

351 MLLDIPHSPA GEKTFDDLFM DLAVRLSGQL NLNLVNDKME EVSTQWLKDV 

4 01 RTYVLARQSE MLKVGIEPGG KTALRLFS* 

ORF1 19ng and ORF1 19-1 show 98.4% identity over 428 aa overlap: 



orfll9ng 
orfll9-l 



orf 119ng 
orfll9-l 



orfll9ng 
orfl!9-l 



orf 119ng 
orfll9-l 



orfll9ng 
orfll9-l 



rfll9ng 
rfll9-l 



orfll9ng 
orf!19-l 



orf 119ng 
orfll9-l 



MIYIVLFLAAVLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGPVM 
I I I I I I I I I : I I I I I I 1 I ! I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I [ i 
MIYIVLFLAWLAWAYNMYQENQYRKKVRDQFGHSDKDALLNSKTSHVRDGKPSGGSVM 



10 
70 



20 



30 



40 



50 



60 



80 90 100 110 120 

MPKPQPAVKKPAKPQDSAMRNLQEQDAVYIAKQKQABCASPFKTEIETALEEIGIIGNSAH 
I I I N I I I I I Mill M I I I ! I I I I i 11 I I I I I I I I I I I I I I | I | | | | | III 
MPKPQPAVKKTAKPQDPAMRNLQEQDAVY1AKQKQAKASPFKTEIETALEESGIIGNSAH 
70 80 90 100 110 120 

130 140 150 160 170 180 

TVSEPQTGHSAPKPADAPAKPVPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 
M I II I I I I I I i I I 1 1 I I i I I : I I I I I I 1 I I I I I I I I I I | I | | I | | | | | | | | | 1 M | | | | 
TVSEPQTGHSAPKPADAPAKPAPVPQTPAKPLITLKELSKVELPWFDVRFDFISYIALTE 

130 140 150 160 170 180 

190 200 210 220 230 240 

AKELHALPRL SNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 

I I I I I I I I M I N i I I I I I II I I M M I I M I I I I I I I I I I I I I M M M II I I I I I I I I 
AKELHALPRLSNRCRYQIVGCTMDDHFQIAEPIPGIRYQAFIVGIQAVSRNGLASQEELS 
190 200 210 220 230 240 

250 260 270 280 290 300 

AFNRQADAFAQSMGGQTLHTDLAAFIEVASALDAFCARVDQTIAIHLVSPTSISGVELRS 
I M I I : I II M I I I I II I I 11 i II I II M I I I I I I I I I I I I II II I I || || | M | | | || | 
AFNRQVDAFAQSMGGQTLHTDLAAFIEVASALDAFCARVDQTIAIHLVSPTSISGVELRS 

250 260 270 280 290 300 

310 320 330 340 350 360 

AVTGVGFVLEDDGAFHYTDTSGSTMFSICSLNNEPFTNALLDNQSYKGFSMLLDIPHSPA 

N I I I I I I I I I I I I I I I I I I M M M II II I I I I I I I I I M I 1 I i I I I I I I I 1 M I M I I 

AVTGVGFVLEDDGAFHYTDTSGSTMFSICSLNNEPFTNALLDNQSYKGFSMLLDIPHSPA 
310 320 330 340 350 360 

370 380 390 400 410 420 

GEKTFDDLFMDLAVRLSGQLNLNLVNDKMEEVSTQWLKDVRTYVLARQSEMLKVGIEPGG 

N I I I I I I I I I II I I I I I I 11 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

GEKTFDDLFMDLAVRLSGQLNLNLVNDKMEEVSTQWLKDVRTYVLARQSEMLKVGIEPGG 

370 380 390 400 410 420 

429 

KTALRLFSX 
I I I I I I I I I 
KTALRLFSX 



55 Based on this analysis, including the presence of a putative leader sequence in the gonococcal 
protein, it is predicted that the proteins from N. meningitidis and gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 64 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 53 1> 



CHIR-0160 (356.001) 



-329- 



PATENT 



1 


. . GCGCGGCACG 


GCACGGAAGA 


TTTCTTCATG 


AACAACAGCG 


ACAC.ATCAG 


51 


GCAGATAGTC 


GAAAGCACCA 


CCGGTACGAT 


GAAGCTGCTG 


ATTTCCTCCA 


101 


TCGCCCTGAT 


TTCATTGGTA 


GTCGGCGGCA 


TCGGCGTGAT 


GAACATCATG 


151 


CTGGTGTCCG 


TTACCGAGCG 


CACCAAAGAA 


ATCGGCATAC 


GGATGGCAAT 


201 


CGGCGCGCGG 


CGCGGCAATA 


TTTyGCAGCA GTTTTTGATT 


GAGGCGGTGT 


251 


TAATCTGCGT 


CATCGGCGGT 


TTGGTCGGCG 


TGGGTTTGTC 


CGCCGCCGTC 


301 


AGCCTCGTGT 


TCAATCATTT 


TGTAACCGAC 


TTCCCGATGG 


ACATTTCCGC 


351 


CATGTCCGTC 


ATCGGCGCGG 


TCGCCTGTTC 


GACCGGAATC 


GGCATCGCGT 


401 


TCGGCTTTAT 


GCCTGCCAAT 


AAAGCAGCCA 


AACTCAATCC 


GATAGACGCA 


451 


TTGGCACAGG 


ATTGA 









This corresponds to the amino acid sequence <SEQ ID 532; ORF134>: 

1 . .ARHGTEDFFM NNSDXIRQIV ESTTGTMKLL ISSIALISLV VGGIGVMNIM 

51 LVSVTERTKE IGIRMAIGAR RGNIXQQFLI EAVLICVIGG LVGVGLSAAV 

101 SLVFNHFVTD FPMDISAMSV IGAVACSTGI GIAFGFMPAN KAAKLNPIDA 

15 151 LAQD* 

Further work revealed the complete nucleotide sequence <SEQ ID 533>: 

1 ATGTCGGTGC AAGCAGTATT GGCGCACAAA ATGCGTTCGC TTCTGACGAT 

51 GCTCGGCATC ATCAT CGGTA TCGCGTCGGT GGTTTCCGTC GTCGCATTGG 

101 GCAATGGTTC GCAGAAAAAA ATCCTTGAAG ACATCAGTTC GATAGGGACG 

20 151 AACACCATCA GCATCTTCCC GGGGCGCGGC TTCGGCGACA GGCGCAGCGG 

201 CAGGAT T AAA ACCCTGACCA TAGACGACGC AAAAATCATC GCCAAACAAA 

251 GCTACGTTGC TTCCGCCACG CCCATGACTT CGAGCGGCGG CACGCTGACT 

301 TACCGCAACA CCGACCTGAC CGCCTCGCTT TACGGCGTGG GCGAACAATA 

351 TTTCGACGTG CGCGGACTGA AGCTGGAAAC GGGGCGGCTG TTTGACGAAA 

25 4 01 ACGATGTGAA AGAAGACGCG CAGGTCGTCG TCATCGACCA AAATGTCAAA 

451 GACAAACTCT TTGCGGACTC GGATCCGTTG GGTAAAACCA TTTTGTTCAG 

501 GAAACGCCCC TTGACCGTCA TCGGCGTGAT GAAAAAAGAC GAAAACGCTT 

551 TCGGCAATTC CGACGTGCTG ATGCTTTGGT CGCCCTATAC GACGGTGATG 

601 CACCAAATCA CAGGCGAGAG CCACACCAAC TCCATCACCG TCAAAATCAA 

30 651 AGACAATGCC AATACCCAGG TTGCCGAAAA AGGGCTGACC GATCTGCTCA 

7 01 AAGCGCGGCA CGGCACGGAA GATTTCTTCA TGAACAACAG CGACAGCATC 
751 AGGCAGATAG TCGAAAGCAC CACCGGTACG ATGAAGCTGC TGATTTCCTC 
801 CATCGCCCTG ATTTCATTGG TAGTCGGCGG CATCGGCGTG ATGAACATCA 

8 51 TGCTGGTGTC CGTTACCGAG CGCACCAAAG AAATCGGCAT ACGGATGGCA 
35 901 ATCGGCGCGC GGCGCGGCAA TATTTTGCAG CAGTTTTTGA TTGAGGCGGT 

951 GTTAATCTGC GTCATCGGCG GTTTGGTCGG CGTGGGTTTG TCCGCCGCCG 

1001 TCAGCCTCGT GTTCAATCAT TTTGTAACCG ACTTCCCGAT GGACATTTCC 

1051 GCCATGTCCG TCATCGGCGC GGTCGCCTGT TCGACCGGAA TCGGCATCGC 

1101 GTTCGGCTTT ATGCCTGCCA ATAAAGCAGC CAAACTCAAT CCGATAGACG 

40 1151 CATTGGCACA GGATTGA 

This corresponds to the amino acid sequence <SEQ ID 534; ORF134-l>: 

1 MSVQAVLAHK MRSLLTMLGI IIGIASWSV VALGN GSQKK ILEDISSIGT 

51 NTISIFPGRG FGDRRSGRIK TLTIDDAKII AKQSYVASAT PMTSSGGTLT 

101 YRNTDLTASL YGVGEQYFDV RGLKLET GRL FDENDVKEDA QVWIDQNVK 

45 151 DKLFADSDPL GKTILFRKRP LTVIGVMKKD ENAFGNSDVL MLWSPYTTVM 

2 01 HQITGESHTN SITVKIKDNA NTQVAEKGLT DLLKARHGTE DFFMNNSDSI 

251 RQIVESTTGT MKL L1SSIAL ISLWGGIGV MNIMLVSVTE RTKEIGIRMA 

301 IGARRGNILQ Q FLIEAVLIC VI GGLVGV GL SAAVSLVFNH FVTDFPMDIS 

351 AMS VIGAVAC STGIGIAFGF MPANKAAKLN PIDALAQD* 

50 Computer analysis of this amino acid sequence gave the following results: 

Homology with the hypothetical protein o648 of E.coli (accession number AE0001 89) 
ORF134 and o648 protein show 45% aa identity in 153aa overlap: 

Orfl34: 2 RHGTEDFFMNNSDXIRQIVESTTGTMKXXXXXXXXXXXVVGGIGVMNIMLVSVTERTKEI 61 

RHG +DFF N D + + VE TT T++ WGGIGVMNIMLVSVTERT+EI 
o64 8: 496 RHGKKDFFTWNMDGVLKTVEKTTRTLQLFLTLVAVISLWGGIGVMNIMLVSVTERTREI 555 

Orfl34: 62 GIRMAIGARRGNIXQQFLIEAXXXXXXXXXXXXXXXXXXXXXFNHFVTDFPMDISAKSVI 121 

GIRMA+GAR ++ QQFLIEA p+ + + s ++++ 

o64 8: 556 GIRMAVGARASDVLQQFLIEAVLVCLVGGALGITLSLLIAFTLQLFLPGWEIGFSPLALL 615 
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Orfl34: 122 GAVACSTGIGIAFGFMPANKAAKLNPIDALAQD 154 

A CST GI FG++PA AA+L+P+DALA++ 
o648: 616 LAFLCSTVTGILFGWLPARNAARLDPVDALARE 648 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF134 shows 98.7% identity over a 154aa overlap with an ORF (ORF 134a) from strain AofN. 
meningitidis: 

10 20 30 

orf 134 .pep ARHGTEDFFMNNSDXIRQIVESTTGTMKLL 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 134a GESHTNSITVKIKDNANTQVAEKGLTDLLBCARHGTEDFFMNNSDSIRQIVESTTGTMKLL 
210 220 230 240 250 260 

40 50 60 70 80 90 

orf 134 . pep ISSIALISLVVGGIGVMNIMLVSVTERTKEIGIRiyiAIGARRGNIXQQFLIEAVLICVIGG 
1 I I I I I I I I I I I I I I I I II I I I I ( 1 I I I 1 I II M I I I II I I I I I I M M I I I N I M II 

orf 134a ISSIALISLA7VGGIGVMNIMLVSVTERTKEIGIRMAIGARRGNILQQFLIEAVLICVIGG 
270 '280 290 300 310 320 



LVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVACSTGIGIAFGFMPANKAAKLNPIDA 
330 340 350 360 370 380 



orf 134. pep LAQDX 
I I I I I 

orfl34a LAQDX 

The complete length ORF134a nucleotide sequence <SEQ ID 535> is: 

1 ATGTCGGTGC AAGCAGTATT GGCGCACAAA ATGCGTTCGC TTCTGACGAT 

51 GCTCGGCATC ATCATCGGTA TCGCTTCGGT TGTCTCCGTC GTCGCATTGG 

101 GCAACGGTTC GCAGAAAAAA ATCCTTGAAG ACATCAGTTC GATAGGGACG 

151 AACACCATCA GCATCTTCCC AGGGCGCGGC TTCGGCGACA GGCGCAGCGG 

201 CAGGATTAAA ACCCTGACCA TAGACGACGC AAAAATCATC GCCAAACAAA 

2 51 GCTACGTTGC TTCCGCCACG CCCATGACTT CGAGCGGCGG CACGCTGACT 

3 01 TACCGCAATA CCGACCTGAC CGCTTCTTTG TACGGTGTGG GCGAACAATA 
351 TTTCGACGTG CGCGGGCTGA AGCTGGAAAC GGGGCGGCTG TTTGACGAAA 

4 01 ACGATGTGAA AGAAGACGCG CAGGTCGTCG TCATCGACCA AAATGTCAAA 
4 51 GACAAACTCT TTGCGGACTC GGATCCGTTG GGTAAAACCA TTTTGTTCAG 
501 GAAACGCCCC TTGACCGTCA TCGGCGTGAT GAAAAAAGAC GAAAACGCTT 
551 TCGGCAATTC CGACGTGCTG ATGCTTTGGT CGCCCTATAC GACGGTGATG 
601 CACCAAATCA CAGGCGAGAG CCACACCAAC TCCATCACCG TCAAAATCAA 
651 AGACAATGCC AATACCCAGG TTGCCGAAAA AGGGCTGACC GATCTGCTCA 

7 01 AAGCGCGGCA CGGCACGGAA GATTTCTTCA TGAACAACAG CGACAGCATC 
751 AGG CAGATAG TCGAAAGCAC CACCGGTACG ATGAAGCTGC TGATTTCCTC 

8 01 CATCGCCCTG ATTTCATTGG TAGTCGGCGG CATCGGCGTG ATGAACATCA 
851 TGCTGGTGTC CGTTACCGAG CGCACCAAAG AAATCGGCAT ACGGATGGCA 
901 ATCGGCGCGC GGCGCGGCAA TATTTTGCAG CAGTTTTTGA TTGAGGCGGT 
951 GTTAATCTGC GTCATCGGCG GTTTGGTCGG CGTGGGTTTG TCCGCCGCCG 

1001 TCAGCCTCGT GTTCAATCAT TTTGTAACCG ACTTCCCGAT GGACATTTCC 

1051 GCCATGTCCG TCATCGGCGC GGTCGCCTGT TCGACCGGAA TCGGCATCGC 

1101 GTTCGGCTTT ATGCCTGCCA ATAAAGCAGC CAAACTCAAT CCGATAGATG 

1151 CATTGGCGCA GGATTGA 

This encodes a protein having amino acid sequence <SEQ ID 536>: 

1 MSVQAVLAHK MRSLLTMLGI IIGIASWSV VALG NGSQKK ILEDISSIGT 

51 NTISIFPGRG FGDRRSGRIK TLTIDDAKII AKQSYVASAT PMTSSGGTLT 

101 YRNTDLTASL YGVGEQYFDV RGLKLETGRL FDENDVKEDA QVWIDQNVK 

151 DKLFADSDPL GKTILFRKRP LTVIGVMKKD ENAFGNSDVL MLWSPYTTVM 

201 HQITGESHTN SITVKIKDNA NTQVAEKGLT DLLKARHGTE DFFMNNSDSI 

251 RQIVESTTGT MKL LISSIAL ISLWGGIGV MNIMLVSVTE RTKEIGIRMA 
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301 IGARRGNILQ Q FLIEAVLIC VIGGLVGV GL SAAVSLVFNH FVTDFPMDIS 
351 AMS VIGAVAC STGIGIAFGF MPANKAAKLN PIDALAQD* 

ORF134a and ORF134-1 show 100.0% identity in 388 aa overlap: 

orf 134a. pep MS VQAVLAHKMRS LLTMLGI 1 1 GI AS WS WALGNG S QKKI LEDI S SIGTNTISI FPGRG 
I I I I I I I I I I I I I I I I 1 1 I I I [ I I I! I I I I I I I I I I I I 1 I 1 I 1 I 1 I 1 I 1 I I I 1 I M I I M 
orf 134-1 MSVQAVLAHKMRSLLTMLGIIIGIASWSVVALGNGSQKKILEDISSIGTNTISIFPGRG 

orf 134a . pep FGDRRSGRIKTLTID DAK 1 I AKQ S Y VAS AT PMT S S GG T L T YRNT DLTAS L YGVGE QY FD V 
I I I I I I I I I I I I 1 I I i I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M 
orf 134-1 FGDRRSGRIKTLTIDDAKIIAKQSYVASATPMTSSGGTLTYRNTDLTASLYGVGEQYFDV 

orf 134a . pep RGLKLET GRL FDENDVKEDAQVWI DQNVKDKLFAD S DPLGKT ILFRKRPLTVIGVMKKD 
I I I I M I I I I II M I I I I I I I II I I I I I I I II I I I I I I I ! I I I I I I I I I 1 I I I I I I I I II 
orf 134-1 RGLKLETGRLFDENDVKEDAQVWIDQNVKDKLFADS DPLGKT ILFRKRPLTVIGVMKKD 

orf 134a. pep ENAFGNSDVLMLWSPYTTVMHQITGESHTNSITVKIKDNANTQVAEKGLTDLLKARHGTE 
I I I I I I II II I I I I I I II I I I I i I I I I I M II II II II II II II M M I II i M M II II 

or f 1 3 4 - 1 ENAFGN S DVLMLWS PYTTVMHQITGESHTNS ITVKIKDNANTQVAEKGLTDLLKARHGTE 

orf 134a . pep DFFMNNSDSIRQIVESTTGTMKLLISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMA 
I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I II II I I I I I I I I I I I I i I I 1 I I I II II 
orf 134-1 DFFMNNSDSIRQIVESTTGTMKLLISSIALISLWGG1GVMNIMLVSVTERTKEIGIRMA 

orf 134a. pep IGARRGNILQQFLIEAVLICVIGGLVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVAC 

I I I I I I II M I I I I I I I I I I 1 II I I I 1 II I I I I I I I I I I II I I II II II II M I I I I I I I 
orf 134-1 IGARRGNILQQFLIEAVLICVIGGLVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVAC 

orf 13 4a. pep STGIGIAFGFMPANKAAKLNPIDALAQDX 

II II N II I I I I II I I I I I I I II M I I I I 
orf 134-1 STGIGIAFGFMPANKAAKLNPIDALAQDX 

Homology with a predicted ORF from N .gonorrhoeae 

ORF134 shows 96.8% identity over a 154aa overlap with a predicted ORF (ORF134.ng) from N. 
gonorrhoeae: 

orf 134. pep ARHGTEDFFMNNSDXIRQIVESTTGTMKLL 30 

I 1 II I I I I I I I I I I I I I : I I I I I I I I I I I 
orfl34ng GESHTNSITVKIKDNANTRVAEKGLAELLKARHGTEDFFMNNSDSIRQMVESTTGTMKLL 2 64 

orf 134 .pep ISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMAIGARRGNIXQQFLIEAVLICVIGG 90 

I I I I 1 I I I I I I I I I I I I I II I I I I I I I I I II I S I I II M I I I II N I M I M I I I : i i I 
orfl34ng ISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMAIGARRGNILQQFLIEAVLICIIGG 324 

orf 134 .pep LVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVACSTGIGIAFGFMPANKAAKLNPIDA 150 

I I I I I 1 I I I I I M I I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I I I I I I I 
orfl34ng LVGVGLSAAVSLVFNHFVTDFPMDISAASVIGAVACSTGIGIAFGFMPANKAAKLNPIDA 384 



orfl34 .pep 
orfl34ng 



LAQD 154 
LAQD 388 



The complete length ORF134ng nucleotide sequence <SEQ ID 537> is: 



251 
301 
351 
401 
451 
501 
551 



ATGTCGGTGC 
GCTCGGCATC 
GCAACGGTTC 
AACACCATCA 
CAAAATCAAA 
GCTACGTTGC 
TACCGCAATA 
TTTCGACGTG 
ACGATGTGAA 
GACAAACTCT 
GAAACGCCCC 
TCGGCAATTC 



AAGCAGTATT 
ATCATCGGTA 
GCAGAAAAAA 
GCATCTTCCC 
ACCCTGACCA 
CTCCGCCACG 
CCGACCTGAC 
CGCGGGCTGA 
AGAAGACGCG 
TTGCGGACTC 
TTGACCGTCA 
CGACGTGCTG 



GGCGCACAAA 
TCGCTTCGGT 
ATCCTCGAAG 
CGGGCGCGGC 
TAGAGGACGC 
CCCATGACTT 
CGCTTCTTTG 
AG C T GGAAAC 
CAAGTCGTCG 
GGATCCGTTG 
TCGGCGTGAT 
ATGCTTTGGT 



ATGCGTTCGC 
TGTCTCCGTC 
ACATCAGTTC 
TTCGGCGACA 
AAAAAT C AT C 
CGAGCGGCGG 
TACGGTGTGG 
GGGGCGGCTG 
TCATCGACCA 
GGTAAAACCA 
GAAAAAAGAC 
CGCCCTATAC 



TTCTGACCAT 
GTCGCGCTGG 
GATGGGGACG 
GGCGCAGCGG 
GCCAAACAAA 
CACGCTGACC 
GCGAACAATA 
TTTGATGAGA 
AAATGTCAAA 
TTTTGTTCAG 
GAAAACGCTT 
GACGGTGATG 
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30 



601 CACCAAATCA CAGGCGAGAG CCACACCAAC TCCATCACCG TCAAAATCAA 

651 AGACAATGCC AATACCCGGG TTGCCGAAAA AGGGCTGGCC GAGCTGCTCA 

7 01 AAGCACGGCA CGGCACGGAA GACTTCTTTA TGAACAACAG CGACAGCATC 

751 AGGCAGATGG TCGAAAGCAC CACCGGTACG ATGAAGCTGC TGATTTCCTC 

801 CATCGCCCTG ATTTCATTGG TAGTCGGCGG CATCGGTGTG ATGAACATTA 

851 TGCTGGTGTC CGTTACCGAG CGCACCAAAG AAATCGGCAT ACGGATGGCA 

901 ATCGGCGCGC GGCGCGGCAA TATTTTGCAG CAGTTTTTGA TTGAGGCGGT 

951 GTTAATCTGC ATCATCGGAG GCTTGGTCGG CGTAGGTTTG TCCGCCGCCG 

1001 TCAGCCTCGT GTTCAATCAT TTTGTAACCG ATTTCCCGAT GGACATTTCG 

1051 GCGGCATCCG TTATCGGGGC GGTCGCCTGT TCGACCGGAA TCGGCATCGC 

1101 GTTCGGCTTT ATGCCTGCCA ATAAGGCAGC CAAACTCAAT CCGATAGATG 

1151 CATTGGCGCA GGATTGA 

This encodes a protein having amino acid sequence <SEQ ID 53 8>: 

1 MSVQAVLAHK MRSLLTMLGI IIGIASWSV VALG NGSQKK ILEDISSMGT 

51 NTISIFPGRG FGDRRSGKIK TLTIDDAKII AKQSYVASAT PMTSSGGTLT 

101 YRNTDLTASL YGVGEQYFDV RGLKLETGRL FDENDVKEDA QVWIDQNVK 

151 DKLFADSDPL GKTILFRKRP LTVIGVMKKD ENAFGNSDVL MLWSPYTTVM 

201 HQITGESHTN SITVKIKDNA NTRVAEKGLA ELLKARHGTE DFFMNNSDSI 

251 RQMVESTTGT MKL LISSIAL ISLWGGIGV MNIMLVSVTE RTKE IGIRMA 

301 IGARRGNILQ Q FLIEAVLIC IIGGLVGV GL SAAVSLVFNH FVTDFPMDIS 

351 AAS VIGAVAC STGIGIAFGF MPANKAAKLN PIDALAQD* 

ORF134ng and ORF134-1 show 97.9% identity in 388 aa overlap: 

orfl34ng MSVQAVLAHKMRSLLTMLGIIIGIASVVSWALGNGSQKKILEDISSMGTNTISIFPGRG 
I I I I M I I I II I I I I I M M I I I M I I I I I I I II I I I I I | I | | | | | ] : | | | | | | | | | | | | 
orf 134-1 MSVQAVLAHKMRSLLTMLGI I IGIAS WSWALGNGSQKKILEDI SSIGTNTI S IFPGRG 

orfl34ng FGDRRSGKIKTLTIDDAKIIAKQSYVASATPMTSSGGTLTYRNTDLTASLYGVGEQYFDV 
f N M ( I : I I I I I I I M I I I I I I I I I II M II N I II I I I I I I I I I I I | | I | | | | M I I I 
orf 13 4-1 FGDRRSGRIKTLTIDDAKIIAKQSYVASATPMTSSGGTLTYRNTDLTASLYGVGEQYFDV 

orfl34ng RGLKLETGRL FDENDVKEDAQVVVIDQNVKDKLFADSDPLGKTILFRKRPLTVIGVMKKD 

I I I 1 I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I | I | | | | | | | | | | | | | | | | | | | | | 
orf 134-1 RGLKLETGRLFDENDVKEDAQVWIDQNVKDKLFADSDPLGKTILFRKRPLTVIGVMKKD 

orfl34ng ENAFGNSDVLMLWSPYTTVMHQITGESHTNSITVKIKDNANTRVAEKGLAELLKARHGTE 

I N I II I I I M N M I I I I I M I I I I M I M I I I I I I M I I I : I I I I I I :: I I 1 I I I I I I 
orf 134-1 ENAFGNSDVLMLWSPYTTVMHQITGESHTNSITVKIKDNANTQVAEKGLTDLLKARHGTE 

orfl34ng DFFMNNSDSIRQMVESTTGTMKLLISSIALISLWGGIGVMNIMLVSVTERTKEIGIRMA 
I I I I M I I I I I I : I I M I i I I I I I I M I M I I I I I I I I I I I | | I | | | | | | | | | | | | | | | j 
orf 134-1 DFFMNNSDSIRQIVESTTGTMKLLISSIALISLVVGGIGVMNIMLVSVTERTKEIGIRMA 

orfl34ng IGARRGNILQQFLIEAVLICIIGGLVGVGLSAAVSLVFNHFVTDFPMDISAASVIGAVAC 
I I I I I I I I III I I I I I M M : I M I I I M M II M I I M M II I N I I I I I ! I M I I I I 
orf 134-1 IGARRGNILQQFLIEAVLICVIGGLVGVGLSAAVSLVFNHFVTDFPMDISAMSVIGAVAC 

orf 13 4ng STGIGIAFGFMPANKAAKLNPIDALAQDX 
M I I II II II I [ II II I II II I M I I I M 
orf 1 3 4-1 STGIGIAFGFMPANKAAKLNPIDALAQDX 

ORF134ng also shows homology to an E.coli ABC transporter: 

sp I P75831 | YBJZ_ECOLI HYPOTHETICAL ABC TRANSPORTER ATP-BINDING PROTEIN YBJZ >g 
^AE000189) o648; similar to YBBA_HAEIN SW: P45247 [Escherichia coli] Length = 

Score = 297 bits (753), Expect = 6e-80 

Identities = 162/389 (41%), Positives = 230/389 (58%), Gaps = 1/389 (0%) 

Query: 1 M S VQAVLAHKMRS LLTMLXXXXXXXXXXXXXXLGNGS QKKI LED I S SMGTNT I S I FPGRG 60 

M+ +A+ A+KMR+LLTML +G+ +++ +L DI S+GTNTI ++PG+ 

Sbjct: 260 lyiAWRALAANKMRTLLTMLGIIIGIASVVSIVVVGDAAKQMVLADIRSIGTNTIDVYPGKD 319 

Query: 61 FGDRRSGKIKTLTIDDAKIIAKQSYVASATPMTSSGGTLT YRNTDLTASL YGVGEQYFDV 120 

FGD + L DD I KQ +VASATP S L Y N D+ AS GV YF+V 

Sbjct: 320 FGDDDPQYQQALKYDDLIAIQKQPWVASATPAVSQNLRLRYNNVDVAASANGVSGDYFNV 379 
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Query: 121 RGLKLETGRLFDENDVKEDAQVWIDQNVKDKLFAD-SDPLGKTILFRKRPLTVIGVMKK 17 9 

G+ G F++ + AQVW+D N + +LF +D +G+ IL P VIGV ++ 
Sbjct: 380 YGMTFSEGNTFNQEQLNGRAQWVLDSNTRRQLFPHKADWGEVILVGNMPARVIGVAEE 439 

Query: 180 DENAFGNSDVLMLWSPYTTVMHQITGESHTNSITVKIKDNANTRVAEKGLAELLKARHGT 239 

++ FG+S VL +W PY+T+ ++ G+S NSITV++K+ ++ AE+ L LL RHG 
Sbjct: 440 KQSMFGSSKVLRWLPYSTMSGRVMGQSWLNSITVRVKEGFDSAEAEQQLTRLLSLRHGK 499 

10 Query: 240 EDFFMNNSDSIRQMVESTTGTMKXXXXXXXXXXXWGGIGVMNIMLVSVTERTKEIGIRM 299 

+DFF N D + + VE TT T++ WGGIGVMNIMLVSVTERT+EIGIRM 
Sbjct: 500 KDFFTWNMDGVLKTVEKTTRTLQLFLTLVAVISLWGGIGVMNIMLVSVTERTREIGIRM 559 

Query: 300 AIGARRGNILQQFLIEXXXXXXXXXXXXXXXXXXXXXXFNHFVTDFPMDISAASVIGAVA 359 
15 A+GAR ++LQQFLIE F+ + + S +++ A 

Sbjct: 560 AVGARASDVLQQFLIEAVLVCLVGGALGITLSLLIAFTLQLFLPGWEIGFSPLALLLAFL 619 

Query: 360 CSTGIGIAFGFMPANKAAKLNPIDALAQD 388 
CST GI FG++PA AA+L+P+DALA++ 
20 Sbjct: 620 CSTVTGILFGWLPARNAARLDPVDALARE 648 

Based on this analysis, including the presence of the leader peptide and transmembrane regions in 
the gonococcal protein, it is prediceted that these proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 65 

25 The following partial DNA sequence was identified in N. meningitidis <SEQ ID 539>: 

1 . . GGGACGGGAG CGATGCTGCT GCTGTTTTAC GCGGTAACGA T . CTGCCTTT 

51 GGCCACTGGC GTTACCCTGA GTTACACCTC GTCGATTTTT TTGGCGGTAT 

101 TTTCCTTCCT GATTTTGAAA GAACGGATTT CCGTTTACAC GCAGGCGGTG 

151 CTGCTCCTTG GTTTTGCCGG CGTGGTATTG CTGCTTAATC CCTCGTTCCG 

30 201 CAGCGGTCAG GAAACGGCGG CACTCGCCGG GCTGGCGGGC GGCGCGATGT 

251 CCGGCTGGGC GTATTTGAAA GTGCGCGAAC TGTCTTTGGC GGGCGAACCC 

301 GGCTGGCGCG TCGTGTTTTA CCTTTCCGTG ACAGGTGTGG CGATGTCGTC 

351 GGTTTGGGCG ACGCTGACCG GCTGGCACAC CCTGTCCTTT CCATCGGCAG 

401 TTTATCTGTC GTGCATCGGC GTGTCCGCGC TGATTGCCCA ACTGTCGATG 

35 4 51 ACGCGCGCCT ACAAAGTCGG CGACAAATTC ACGGTTGCCT CGCTTTCCTA 

501 TATGACCGTC GTTTTTTCCG CTCTGTCTGC CGCATTTTTT CTGGGCGAAG 

551 AGCTTTTCTG GCAGGAAATA CTCGGTATGT GCATCATCAT CCTCAGCGGT 

601 ATTTTGA 

This corresponds to the amino acid sequence <SEQ ID 540; ORF135>: 

40 1 . .GTGAMLLLFY AVTILPLATG VTLSYTSSIF LAVFSFLILK ERISVYTQAV 

51 LLLGFAGWL LLNPSFRSGQ ETAALAGLAG GAMSGWAYLK VRELSLAGEP 
101 GWRWFYLSV TGVAMSSVWA TLTGWHTLSF PSAVYLSCIG VSALIAQLSM 

151 TRAYKVGDKF TVASLSYMTV VFSALSAAFF LGEELFWQEI LGMCIIISAV 
201 F* 

45 Further work revealed the complete nucleotide sequence <SEQ ID 54 1>: 

1 AT GGATACCG CAAAAAAAGA CAT T T T AGGA TCGGGCTGGA TGCTGGTGGC 

51 GGCGGCCTGC TTTACCATTA TGAACGTATT GATTAAAGAG GCATCGGCAA 

101 AATTTGCCCT CGGCAGCGGC GAATTGGTCT TTTGGCGCAT GCTGTTTTCA 

151 ACCGTTGCGC TCGGGGCTGC CGCCGTATTG CGTCGGGACA mCTTCCGCAC 

50 201 GCCCCATTGG AAAAACCACT TAAACCGCAG TATGGTCGGG ACGGGGGCGA 

251 TGCTGCTGCT GTTTTACGCG GTAACGCATC TGCCTTTGGC CACTGGCGTT 

301 ACCCTGAGTT ACACCTCGTC GATTTTTTTG GCGGTATTTT CCTTCCTGAT 

351 TTTGAAAGAA CGGATTTCCG TTTACACGCA GGCGGTGCTG CTCCTTGGTT 

„ 401 TTGCCGGCGT GGTATTGCTG CTTAATCCCT CGTTCCGCAG CGGTCAGGAA 

451 ACGGCGGCAC TCGCCGGGCT GGCGGGCGGC GCGATGTCCG GCTGGGCGTA 

501 TTTGAAAGTG CGCGAACTGT CTTTGGCGGG CGAACCCGGC TGGCGCGTCG 

551 TGTTTTACCT TTCCGTGACA GGTGTGGCGA TGTCGTCGGT TTGGGCGACG 



CHIR-0160 (356.001) 



-334- 



PATENT 



601 CTGACCGGCT GGCACACCCT GTCCTTTCCA TCGGCAGTTT ATCTGTCGTG 

651 CATCGGCGTG TCCGCGCTGA TTGCCCAACT GTCGATGACG CGCGCCTACA 

701 AAGTCGGCGA CAAATTCACG GTTGCCTCGC TTTCCTATAT GACCGTCGTT 

751 TTTTCCGCTC TGTCTGCCGC ATTTTTTCTG GGCGAAGAGC TTTTCTGGCA 

8 01 GGAAATACTC GGTATGTGCA TCATCATCCT CAGCGGTATT TTGAGCAGCA 

851 TCCGCCCCAC TGCCTTCAAA CAGCGGCTGC AATCCCTGTT CCGCCAAAGA 

901 TAA 

This corresponds to the amino acid sequence <SEQ ID 542; ORF135-l>: 

1 MDTAKKDILG SGWMLVAAA C FTIMNVLIKE ASAKFALGSG ELVFWRMLFS 
51 TVALGAAAVL RRDXFRTPHW KNHLNRS MVG TGAMLLLFYA VTHL PLATGV 
101 T LSYTSSIFL AVFS FLIL KE RISVYTQ AVL LLGFAGWLL LNPSF RSGQE 
151 TAALAGLAGG AMS GWAYLKV RELSLAGEPG WRVVFYLSVT GVAMSSVWAT 
201 LTGWHTLS FP SAVYLSCIGV SALIA QLSNT RAYKVGDKFT VAS LSYMTW 
251 FSALSAAFFL GEELFWQ EIL GMCIIILSGI LSSI RPTAFK QRLQSLFRQR 
301 * 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF135 shows 99.0% identity over a 197 aa overlap with an ORF (ORF135a) from strain A of AT. 
meningitidis: 



10 20 30 

orf 135 . pep GTGAMLL L FYAVT I L PLATGVT LSYTSSIF 

I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I 
o r f 1 3 5 a STVALGAAAVLRRDT FRT PHWKNHLNRSMVGTGAMLLL FYAVTHLPLAT GVTLSYTSS I F 

50 60 70 80 90 100 

40 50 60 70 80 90 

orf 135 . pep LAVFSFLILKERISVYTQAVLLLGFAGWLLLNPSFRSGQETAALAGLAGGAMSGWAYLK 
1 I I I I I I I I I I I I I II I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I | 
orf 135a LAVFSFLILKERISVYTQAVLLLGFAGWLLLNPSFRSGQETAALAGLAGGAMSGWAYLK 
110 120 130 140 150 160 

100 110 120 130 140 150 

orf 135 . pep VRELSLAGEPGWRVVFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSM 
I I I I I I N I I I I I I I I I [ M I M I N I ( f I I 1 M I I M II M M I I M M M M I I M ( I 
orf 135a VRELSLAGEPGWRWFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSM 
170 180 190 200 210 220 

160 170 180 190 200 

orf 135 . pep TRAYKVGDKFTVASLS YMTWFSALSAAFFLGEELFWQEILGMCI 1 1 SAVFX 

I I I I I I I I I I I I I I I II I I I I I I I I I : I I I II I I I I I I I I I I 

orf 135a TRAYKVGDKFTVASL S YMTVVFS ALSAAFFLAEELFWQEILGMCI I ILSGILSS IRPTAF 

230 240 250 260 270 280 

orfl35a KQRLQSLFRQRX 
290 300 

The complete length ORF135a nucleotide sequence <SEQ ID 543> is: 

1 ATGGATACCG CAAAAAAAGA CATTTTAGGA TCGGGCTGGA TGCTGGTGGC 

51 GGCGGCCTGC TTTACCATTA TGAACGTATT GATTAAAGAG GCATCGGCAA 

101 AATTTGCCCT CGGCAGCGGC GAATTGGTCT TTTGGCGCAT GCTGTTTTCA 

151 ACCGTTGCGC TCGGGGCTGC CGCCGTATTG CGT CGGGACA CCTTCCGCAC 

201 GCCCCATTGG AAAAACCACT TAAACCGCAG TATGGTCGGG ACGGGGGCGA 

251 TGCTGCTGCT GTTTTACGCG GTAACGCATC TGCCTTTGGC CACCGGCGTT 

301 ACCCTGAGTT ACACCTCGTC GATTTTTTTG GCGGTATTTT CCTTCCTGAT 

351 TTTGAAAGAA CGGATTTCCG TTTACACGCA GGCGGTGCTG CTCCTTGGTT 

401 TTGCCGGCGT GGTATTGCTG CTTAATCCCT CGTTCCGCAG CGGTCAGGAA 

451 ACGGCGGCAC TCGCCGGGCT GGCGGGCGGC GCGATGTCCG GCTGGGCGTA 

501 TTTGAAAGTG CGCGAACTGT CTTTGGCGGG CGAACCCGGC TGGCGCGTCG 

551 TGTTTTACCT TTCCGTGACA GGTGTGGCGA TGTCATCGGT TTGGGCGACG 

601 CTGACCGGCT GGCACACCCT GTCCTTTCCA TCGGCAGTTT ATCTGTCGTG 
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651 CATCGGCGTG TCCGCGCTGA TTGCCCAACT GTCGATGACG CGCGCCTACA 

701 AAGTCGGCGA CAAATTCACG GTTGCCTCGC TTTCCTATAT GACCGTCGTT 

751 TTTTCCGCTC TGTCTGCCGC ATTTTTTCTG GCCGAAGAGC TTTTCTGGCA 

801 GGAAATACTC GGTATGTGCA TCATCATCCT CAGCGGTATT TTGAGCAGCA 

851 TCCGCCCCAC TGCCTTCAAA CAGCGGCTGC AATCCCTGTT CCGCCAAAGA 

901 TAA 

This encodes a protein having amino acid sequence <SEQ ID 544>: 

MDTAKKDILG SGWMLVAAA C FTIMNVLIKE ASAKFALGSG ELVFWRMLFS 
TVALGAAAVL RRDTFRTPHW KNHLNRS MVG TGAMLLLFYA VTHL PLATGV 
T LSYTSSIFL AVFSFLIL KE R I SVYTQ AVL LLGFAGWLL LNPSF RSGQE 
TAALAGLAGG AMSGWAYLKV RELSLAGEPG WRWFYLSVT GVAMSSVWAT 
LTGWHTLS FP SAVYLSCIGV SALIA QLSMT RAYKVGDKFT VAS LSYMTW 
FSALSAAFFL AEELFWQ EIL GMCIIILSGI LSSI RPTAFK QRLQSLFRQ.R 



101 
151 

201 
251 
301 



15 ORF135a and ORF135-1 show 99.3% identity in 300 aa overlap: 

orf 135a. pep MDTAKKDILGSGWMLVAAACFTIMNVLIKEASAKFALGSGELVFWRMLFSTVALGAAAVL 
I I I I I I I I I I i I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I | I | | | | | | | | | 
orf 135-1 MDTAKKDILGSGWMLVAAACFTIMNVLIKEASAKFALGSGELVFWRMLFSTVALGAAAVL 

20 orf 135a. pep RRDTFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLATGVTLSYTSSIFLAVFSFLILKE 

I I I = M II I 1 I I I I I M I I II I I I I I I II 1 I I I I I I I 1 II I [ I || I I I I I I | | I | | | | | | 
orf 135-1 RRDXFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLATGVTLSYTSSIFLAVFSFLILKE 

orf 135a . pep RISVYTQAVLLLGFAGWLLLNPSFRSGQETAALAGLAGGAMSGWAYLKVRELSLAGEPG 
25 I I I I I I I 1 M M I I I I I I I I I 1 I I I I I II I I I M M | | | M | | | | M M I I I I I I I I I I I 

orf 135-1 RISVYTQAVLLLGFAGWLLLNPSFRSGQETAALAGLAGGAMSGWAYLKVRELSLAGEPG 

orf 135a . pep WRVVFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSMTRAYKVGDKFT 
OA M I I II IN I I I M I I I I I I IN I I I I I I I M M I I I I I I I I I I I I I I I 1 I I I I I I I I I I 

JV orfl35-l WRWFYL S VT GVAMS S VWATLTGWHTL S F P SAVYL SCIGVSAL I AQL SMTRAYKVGDKFT 

orf 135a. pep VASLSYMTWFSALSAAFFLAEELFWQEILGMCIIILSGILSSIRPTAFKQRLQSLFRQR 
I I I I I I I I I I I I I I I I I II I : I I I I II I I I I I I II I I I I I I I I I I I I I I I I | I I | | | | | | 
^ orf 135-1 VASLSYMTWFSALSAAFFLGEELFWQE ILGMCI I ILSGILS S IRPTAFKQRLQS LFRQR 

Homology with a predicted ORF from ~N. gonorrhoeae 

ORF135 shows 97% identity over a 201aa overlap with a predicted ORF (ORF135ng) from 
N. gonorrhoeae: 



orf 135ng 



GTGAMLLLFYAVTXLPLATGVTLSYTSSIF 30 
I I I I I I I I I I I I I I I I = I M II II II I I I 
STVTLGAAAVLRRDTFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLTTGVTLSYTSSIF 335 



orf 135 . pep LAVFS FLI LKERI S VYTQAVLLLGFAGWLLLNP S FRSGQE T AALAGLAGGAMS GWAYLK 90 

I N I I 1 II M M M M I I I I i M M I M I I I I 1 I I I M II I I I I I I I I I I I I I I I I I I I 
orfl35ng LAVFS FLILKERISVYTQAVLLLGFAGWLLLNPSFRSGQEPAALAGLAGGAMSGWAYLK 395 

orf 135 . pep VRELSLAGEPGWRWFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSM 150 

i I M I I II I I I I I I I I I I I : I I I | | | | | | | | | | || I I I I I I I I I I I I I I I I I I I I I I I I 
orfl35ng VRELSLAGEPGWRWFYLSATGVAMSSVWATLTGWHTLSFPSAVYLSG1GVSALIAQLSM 455 

orf 135. pep T RAYKVGDKFT VAS LS YMT WFS AL S AAFFLGEELFWQE I LGMC III S AVF 201 

I I I I I I I I I I M I I I I I I I I I II I I I I I I I I II I II I I I I I I I I : I 

orfl35ng TRAYKVGDKFTVASLS YMTWFSAL SAAFFLGEELFWQE ILGMCI 1 1 SAAF 506 

An ORF135ng nucleotide sequence <SEQ ID 545> was predicted to encode a protein having amino 
acid sequence <SEQ ID 546>: 

1 MPSEKAFRRH LRTAS FQGLH LHHFHQKVGK CGIIGFGIHI FPTLLPA AOG 
51 ILDIQLGLFR IDFAALAVYR RTQVDFIHTV IDGIASDQAF SEWQILRRL 
101 NLGHFTDTHL IAQARRFIAD FGNIRPMRRG EAKTFCRCFR FDGIDGIHGD 
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151 FRQCGHINRL APGKDCRNGK RDKVFFHTRH YNQVCLEKTN CSARKIKFRH 

201 QKQAKTHSTS LAARFTIRPS LSQRPFMDTA KKDILGS GWM LVAAACFTVM 

2 51 NVLIKEASAK FALGSGELVF WRMLFSTVTL GAAAVLRRDT FRTPHWKNHL 

3 01 NRS MVGTGAM LLLFYAVTHL PLTTGVT LSY TSSIFLAVFS FLIL KERISV 
351 YTO AVLLLGF AGWLLLNPS F RSGQEPAAL AGLAGGAMSG WAYLKVRELS 
401 LAGEPGWRVV FYLSATGVAM SSVWATLTGW HTLS FPSAVY LSGIGVSALI 
451 AQLSMTRAYK VGDKFTVAS L SYMTWFSAL SAAFFL GEE L FWQEILGMCI 
501 IISAAF * 

Further work revealed the following gonococcal sequence <SEQ ID 547>: 

1 ATGGATACCG CAAAAAAAGA CATTTTAGGA TCGGGCTGGA TGCTGGTGGC 

51 GGCGGCCTGC TTCACCGTTA TGAACGTATT GATTAAAGAG GCATCGGCAA 

101 AATTTGCCCT CGGCAGCGGC GAATTGGTCT TTTGGCGCAT GCTGTTTTCA 

151 ACCGTTACGC TCGGTGCTGC CGCCGTATTG CGGCGCGACA CCTTCCGCAC 

201 GCCCCATTGG AAAAACCACT TAAACCGCAG TATGGTCGGG ACGGGGGCGA 

251 TGCTGCTGCT GTTTTACGCG GTAACGCATC TGCCTTTGAC AACCGGCGTT 

301 ACCCTGAGTT ACACCTCGTC GATTTTTttg GCGGTATTTT CCTTCCTGAT 

351 TTTGAAAGAA CGGATTTCCG TTTACACGCA GGCGGTGCTG CTCCTTGGTT 

401 TTGCCGGCGT GGTATTGCTG CTTAATCCCT CGTTCCGCAG CGGTCAGGAA 

451 CCGGCGGCAC TCGCCGGGCT GGCGGGCGGC GCGATGTCCG GCTGGGCGTA 

501 TTTGAAAGTG CGCGAACTGT CTTTGGCGGG CGAACCCGGC TGGCGCGTCG 

551 TGTTTTACCT TTCCGCAACC GGCGTGGCGA TGTCGTCggt ttgggcgacg 

601 Ctgaccggct ggCACAcccT GTCCTTTcca tcggcagttt ATCtgtCGGG 

651 CATCGGCGTG tccgcgCtgA TTGCCCAaCT GtcgatgAcg cGCGcctaca 

701 aaGTCGGCGA CAAATTCACG GTTGCCTCGC tttCCtaTAt gaccgtcGTC 

751 TTTTCCGCCC TGTCTGCCGC ATTTTTTCTg ggcgaagagc tttTCtggCA 

801 GGAAATACTC GGTATGTGCA TCATTAtccT CAGCGGCATT TTGAGCAGCA 

851 TCCGCCCCAT TGCCTTCAAA CAGCGGCTGC AAGCCCTCTT CCGCCAAAGA 

901 TAA 

This corresponds to the amino acid sequence <SEQ ID 548; ORF135ng-l>: 

1 MDTAKKDILG SGWMLVAAA C FTVMNVLIKE ASAKFALGSG ELVFWRMLFS 

51 TVT LGAAAVL RRDTFRTPHW KNHLNRS MVG TGAMLLLFYA VTHL PLTTGV 

101 T LSYTSSIFL AVFSFLIL KE RISVYTQ AVL LLGFAGWLL LNPSF RSGQE 

151 PAALAGLAGG AMSGWAYLKV RELSLAGEPG WRWFYL SAT GVAMSSVWAT 

201 LTGWHTLS FP SAVYLSGIGV SALIA QLSMT RAYKVGDKFT VASLSYMTVV 

251 FSALSAAFFL GEELFWQ EIL GMCIIILSGI LSSI RPIAFK QRLQALFRQR 

301 * 

ORF135ng-l and ORF135-1 show 97.0% identity in 300 aa overlap: 

orf 135ng-l pep MDT AKKD I LG S GWMLVAAACFTVMNVL IKE AS AKFALG S GE LVFWRML FSTVT LGAAAVL 
I I I I I I I I I I I I I I I I I 1 I I I I : I I I I I I I I I I I I M I I I I I I I I I I I I I I I : I I I I M I 
orf 135-1 MDTAKKDILG SGWMLVAAAC FT IMNVLIKE AS AKFALGSGELVFWRMLFSTVALGAAAVL 

orfl35ng-l.pep RRDTFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLTTGVTLSYTSSIFLAVFSFLILKE 
I I I : [ I [ I I I I I I II I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I 
orf 135-1 RRDXFRTPHWKNHLNRSMVGTGAMLLLFYAVTHLPLATGVTLSYTSSIFLAVFSFLILKE 

orf 135ng-l . pep RISVYTQAVLLLGFAGWLLLNPSFRSGQEPAALAGLAGGAMSGWAYLKVRELSLAGEPG 
1 I 1 I I I i M I I I I I I I I I I I I I I I I I I I I i 1 I I I I I I I I I I I I I I I I I M M I I I I M I 
orf 135-1 RISVYTQAVLLLGFAGWLLLNPSFRSGQETAALAGLAGGAMSGWAYLKVRELSLAGEPG 

orf 135ng-l . pep WRWFYLSATGVAMSSVWATLTGWHTLSFPSAVYLSGIGVSALIAQLSMTRAYKVGDKFT 
I I I I I I I I : I I I [ I I I I I I I I I I I I I I I I I I I I! I [ I I I I I I I I I I I I I I I I I I I II 1 I 
orf 135-1 WRWFYLSVTGVAMSSVWATLTGWHTLSFPSAVYLSCIGVSALIAQLSMTRAYKVGDKFT 

orfl35ng-l.pep VASLSYMTVVFSALSAAFFLGEELFWQEILGMCIIILSGILSSIRPIAFKQRLQALFRQR 
I II I I I j I I I I I I I I I II I I I I II I I I I I I I I I I II M I I I M I I I I II I I I I : I I I I I 
orf 135-1 VASLSYMTVVFSALSAAFFLGEELFWQEILGMCIIILSGILSSIRPTAFKQRLQSLFRQR 

Based on this analysis, including the presence of several putative transmembrane domains in the 
gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 
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The following DNA sequence was identified in N. meningitidis <SEQ ID 549>: 

1 ATGAAGCGGC GTATAGCCGT CTTCGTCCTG TTCCCGCAGA TAATCCGAGT 

51 TTTGGGACAA CTGTTGCCGA AAATCGTCAA TACAGTTCCG GCACATCGGA 

101 TGCTCTTCCA GATTTTCGGG ATGTTCTTTT TCTTCATACA CCAGCAATAT 

151 CTGCCCGGGA TCGCCGAAAT CGATTCCCCA TGCGGCATCG TGTTCGGTGC 

201 GCTCCTCTTC CGTCATCTGC CCGCGCATTG CCTGTATGGT AAAGCCGCCG 

251 TAGGGGATGC CgTTGCACAC GAACATCCAG TCGCTGATGT CGTCAACCGG 

301 AACGCAAACG cTTTCGCCTT GTTCGACATT GGTCAGTTCG CCsGGTTCAT 

351 TGTTCAGCAC AC CGTAAAT A TAAAGACCGT CAAAATAAAT ATCGTCGATC 

4 01 CACATATGTT CGCAAATTTC GCCGTCTTCG CCGTCTTGGA AAAAAGGGAC 

451 TTTGACCATG G CAAAATCCA AGGCGGAAAT AATGCGGCGG CGTTCCCAAA 

501 AAAGcTCGCG C C AAAAAT AT TTGAATGTTT TACGGGCGCG TTCGTCGGCA 

551 CGGTTTACCG GTTCGTCTGC CTGTTCTACA TAATAAATGA CGGAATCGCC 

601 CATCATATCT GCTCCTCAAC GTGTACGGTA TCTGTTTGCA CCTTACTGCG 

651 GCTTTCTgcC kTCGGCATCC GATTCGGATT TGAAAAGTTC mmrwyATTCG 

7 01 GAATAG 

This corresponds to the amino acid sequence <SEQ ID 550; ORF136>: 



1 MKRRIAVFVL FPQIIRVLGQ LLPKIVNTVP AHRMLFQIFG MFFFFIHQQY 

51 LPGIAEIDSP CGIVFGALLF RHLPAHCLYG KAAVGDAVAH EHPVADWNR 

101 NANAFALFDI GQFAXFIVQH TVNIKTVKIN IVDPHMFANF AVFAVLEKRD 

151 FDHGKIQGGN NAAAFPKKLA PKIFECFTGA FVGTVYRFVC LFYIINDGIA 

201 HHSAPQRVRY LFAPYCGFLP SASDSDLKSS XXSE* 

Further work revealed the complete nucleotide sequence <SEQ ID 55 1>: 

1 ATGATGAAGC GGCGTATAGC CGTCTTCGTC CTGTTCCCGC AGATAATCCG 

51 AGTTTTGGGA CAACTGTTGC CGAAAATCGT CAATACAGTT CCGGCACATC 

101 GGATGCTCTT CCAGATTTTC GGGATGTTCT TTTTCTTCAT ACACCAGCAA 

151 TATCTGCCCG GGATCGCCGA AATCGATTCC CCATGCGGCA TCGTGTTCGG 

2 01 TGCGCTCCTC TTCCGTCATC TGCCCGCGCA TTGCCTGTAT GGTAAAGCCG 

251 CCGTAGGGGA TGCCGTTGCA CACGAACATC CAGTCGCTGA TGTCGTCAAC 

301 CGGAACGCAA ACGCTTTCGC CTTGTTCGAC ATTGGTCAGT TCGCCGGGTT 

351 CATTGTTCAG CACACCGTAA ATATAAAGAC CGTCAAAATA AATATCGTCG 

4 01 AT CCACAT AT GTTCGCAAAT TTCGCCGTCT TCGCCGTCTT GGAAAAAAGG 

4 51 GACTTTGACC ATGGCAAAAT CCAAGGCGGA AATAATGCGG CGGCGTTCCC 

501 AAAAAAGCTC GCGCCAAAAA TATTTGAATG TTTTACGGGC GCGTTCGTCG 

551 GCACGGTTTA CCGGTTCGTC TGCCTGTTCT ACATAATAAA TGACGGAATC 

601 GCCCATCATT CTGCTCCTCA ACGTGTACGG TATCTGTTTG CACCTTACTG 

651 CGGCTTTCTG CCTTCGGCAT CCGATTCGGA TTTGAAAAGT TCCAAATATT 

701 CGGAATAG 

This corresponds to the amino acid sequence <SEQ ID 552; ORF136-l>: 

1 MMKR RIAVFV LFPQIIRVLG QL LPKIVNTV PAHRMLFQIF GMFFFFIHQQ 

51 YLPGIAEIDS PCGIVFGALL FRHLPAHCLY GKAAVGDAVA HEHPVADVVN 

101 RNANAFALFD IGQFAGFIVQ HTVNIKTVKI NIVDPHMFAN FAVFAVLEKR 

151 DFDHGKIQGG NNAAAFPKKL APKIFECFT G AFVGTVYRFV CLFYIIN DGI 

201 AHHSAPQRVR YLFAPYCGFL PSASDSDLKS SKYSE* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF136 shows 71 .7% identity over a 237aa overlap with an ORF (ORF136a) from strain A ofN. 
meningitidis: 



10 20 30 40 50 59 

orfl3 6.pep MKRRIAVFVLFPQIIRVLGQLLPKIVNTVPAHRMLFQIFGMFFFFIHQQYLPGIAEIDS 

I I I I I I I I I I : I II: I I I I I I I I I | | I | | | | I I I I I I I 

or ± 136a MMKRRIAVFVLLMQKIRILGQLLPKIVNTVPAHRMLFQXFGMFFFFIHQQYLPGIAEIDS 
10 20 30 40 50 60 
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60 70 80 90 100 110 119 

nrf1 3 6 DeD PCGIV FGALLFRHLPAHCLYGKAAVGDAVAHEHPVADVVNRNANAFALFDIGQFAXFIVQ 

P P IMMIhlllll :IMIIIIIIl:IIIIIIM I I I I I I I I I I I I I I I INI 

5 orf 13 6a PCGIVFGTLLFRHXSTHCLYGKAAVGNAVAHEHPVADWNRNANAFALFDIGQFAGFIVQ 

70 80 90 100 110 120 

120 130 140 150 160 170 179 

orf 136 pep HTWIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKIFECFTG 
10 ' I :: I : I I I i I I I I I I I I I I I I I I I I I N I : : I : N I : : : : 

orf 13 6a HAINVKTVKINIVDPHMFANFAXFAVLEKRALTMAKSKXXXMRRRSQKSSRQKYLNVLRA 
130 140 150 160 170 180 

180 190 200 210 220 230 

15 orf 136 pep AFVGTVYRFVCLFYIINDGIAHH SAPQRVRYLFAPYCGFLPSASDSDLKSSXXSEX 

" P : ||:| : ::: I I I ! 1 I I I II I I 1 I I I I 

orf 13 6a R— SPARFTGLSACSTXXMTESPIISAPQRVRYLFAPYCGFLPSASDSDLKSSKYSEX 

190 200 210 220 230 

The complete length ORF136a nucleotide sequence <SEQ ID 553> is: 

20 1 ATGATGAAGC GGCGTATAGC CGTCTTCGTC CTGCTCATGC AGAAAATCCG 

51 GATTTTGGGA CAACTGTTGC CGAAAATCGT CAATACAGTT CCGGCACATC 

101 GGATGCTCTT CCAGATNTTC GGGATGTTCT TTTTCTTCAT ACACCAGCAA 

151 TACCTGCCCG GGATCGCCGA AATCGATTCC CCATGCGGCA TCGTGTTCGG 

201 TACGCTCCTC TTCCGTCATC NGTCCACGCA TTGCCTGTAT GGTAAAGCCG 

25 251 CCGTAGGGAA TGCCGTTGCA CACGAACATC CAGTCGCTGA TGTCGTCAAC 

301 CGGAACGCAA ACGCTTTCGC CTTGTTCGAC ATTGGTCAGT TCGCCGGGTT 

351 CATTGTTCAG CACGCCATAA ATGTAAAGAC CGTCAAAATA AATATCGTCG 

401 AT C C AC AT AT GTTCGCAAAT TTCGCCNTCT TCGCCGTCTT GGAAAAAAGG 

451 GCTTTGACCA TGGCAAAATC TAAGGNGNNA NNGATGCGGC GGCGTTCCCA 

30 501 AAAAAGCTCG CGCCAAAAAT ATTTGAATGT TTTGCGGGCG CGTTCGCCGG 

551 CACGGTTTAC CGGTTTGTCT GCCTGTTCTA CATAATAAAT GACGGAATCG 

601 CCCATCATAT CTGCTCCTCA ACGTGTACGG TATCTGTTTG CACCTTACTG 

651 CGGCTTTCTG CCTTCGGCAT CCGATTCGGA TTTGAAAAGT TCCAAATATT 

701 CGGAATAG 

35 This encodes a protein having amino acid sequence <SEQ ID 554>: 

1 MMKRR IAVFV LLMQKIRILG QL LPKIVNTV PAHRMLFQXF GMFFFFIHQQ 

51 YLPGIAEIDS PCGIVFGTLL FRHXSTHCLY GKAAVGNAVA HEHPVADWN 

101 RNANAFALFD IGQFAGFIVQ HAINVKTVKI NIVDPHMFAN FAXFAVLEKR 

151 ALTMAKSKXX XMRRRSQKSS RQKYLNVLRA RSPARFTGLS ACST**MTES 

40 201 PIISAPQRVR YLFAPYCGFL PSASDSDLKS SKYSE* 

ORF136a and ORF136-1 show 73.1% identity in 238 aa overlap: 

10 20 30 40 50 60 

orf 136a. pep MMKRRIAVFVLLMQKIRILGQLLPKIVNTVPAHRMLFQXFGMFFFFIHQQYLPGIAEIDS 
I I I I I I I I I I I : I I I : I I I I I ] I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I 1 
45 orf 13 6-1 MMKRRIAVFVLFPQIIRVLGQLLPKIVNTVPAHRMLFQIFGMFFFFIHQQYLPGIAEIDS 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 136a. pep PCGIVFGTLLFRHXSTHCLYGKAAVGNAVAHEHPVADWNRNANAFALFD IGQFAGFIVQ 

50 I I I I I II : I II I I : I I I I II N I I : I II II I II II II II II II II II II II II I I I I I 

orf 136-1 PCGIVFGALLFRHLPAHCLYGKAAVG DA VAHEHPVADWNRNANAFALFD IGQFAGFIVQ 

70 80 90 100 110 120 

130 140 150 160 170 180 

55 orf 136a . pep HAINVKTVKINIVDPHMFANFAXFAVLEKRALTMAKSKXXXMRRRSQKSSRQKYLNVLRA 

I :: I : I 1 I I I I I I I I I I II I I I I II II I I : : I : I = I : : : : 

orf 136-1 HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKIFECFTG 
130 140 150 160 170 180 

60 190 200 210 220 230 

orf 136a. pep R S PARFTGLS ACSTXXMTE S P 1 1 SAPQRVRYLFAPYCGFLPS AS DS DLKS SKYSEX 

: I I : I : : : : I I I 1 I 1 II I I II II II I I I I I I II I II I I I II I 
orf 13 6-1 AFVGTVYRFVCLFYIINDGIAHH SAPQRVRYLFAPYCGFLPSASDS DLKS SKYSEX 
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Homology with a predicted ORF from N. gonorrhoeae 

ORF136 shows 92.3% identity over a 234aa overlap with a predicted ORF (ORF136ng) from 
5 N. gonorrhoeae: 

orfl36 pep MKRRIAVFVLFPQIIRVLGQLLPKIWVPAHRMLFQIFGMFFFFIHQQYLPGIAEIDS 59 

Ml I : I I I : I I I I I I I I M I I I I I i I i I I I I I I i I I I I I = I I I I I I N I I I 

orfl36ng MMKRRIAVFVLLMQKIRILGQLLPKIVNTVPAHRMLFQIFGMFFFFIHRQYLPGIAEIDS 60 

10 orfl36 pep PCGIVFGALLFRHLPAHCLYGKAAVGDAVAHEHPVADWNRNANAFALFDIGQFAXFIVQ 119 

| ||MI:II1MI I I I I I 1 I I I I 1 I 1 I 1 I 1 11 I I I I : I I I I I I I I I 1 I I I I I 1 1 N 
orf 13 6ng PGGIVFGTLLFRHLSAHCLYGKAAVGDAVAHEHPVADVANRNANAFALFDIGQSAGFIVQ 120 

orfl36 pep HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKIFECFTG 179 
15 | | | [ | | | | I I I I I I I I I I I I I I I M I I I 1 I I I I I I 1 1 I 1 I I 1 I 1 I 1 I I I i M I : 1 1 I I I I 

orf 13 6ng HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKVFECFTG 18 0 

orf 13 6 pep AFVGTVYRFVCLFYIINDGIAHHSAPQRVRYLFAPYCGFLPSASDSDLKSSXXSE 234 
| | : | | | | I I I I I I I I I I I I I I I I : I I I I I I I M I I I I I I I I I M I 1 I I ! M 
20 orfl3 6ng AFAGTVYRFVCLFYIINDGIAHHTAPQRVRYLFAPYRGFLPPASDSDLKSSKYSE 235 

The complete length ORF136ng nucleotide sequence <SEQ ID 555> is: 

1 ATGATGAAGC GGCGTATAGC CGTCTTCGTC CTGCTCATGC AGAAAATCCG 

51 GATTTTGGGA CAACTGTTGC CGAAAATCGT CAATACAGTT CCGGCACATC 

101 GGATGCTCTT CCAAATTTTC GGGATGTTCT TTTTCTTCAT ACACCGGCAA 

25 151 TACCTGCCCG GGATCGCCGA AATCGATTCC CCAGGCGGTA TCGTGTTCGG 

201 TACGCTCCTC TTCCGTCATC TGTCCGCGCA TTGCCTGTAC GGTAAAGCCG 

251 CCGTAGGGGA TGCCGTTGCA CACGAACATC CAGTCGCTGA TGTCGCCAAC 

301 CGGAACGCAA ACGCTTTCGC CTTGTTCGAC ATTGGTCAGT CCGCCGGGTT 

351 CATTGTTCAG CACACCGTAA ATATAAAGAC CGTCAAAATA AATATCGTCG 

30 4 01 ATCCACATAT GTTCGCAAAT TTCGCCGTCT TCGCCGTCTT GGAAAAAAGG 

451 GACTTTGACC ATGGCAAAAT CCAAGGCGGA AATAATGCGG CGGCGTTCCC 

501 AAAAAAGCTC GCGCCAAAAG TATTTGAATG TTTTACGGGC GCGTTCGCCG 

551 GCACGGTTTA CCGGTTCGTC TGCCTGTTCT ACATAATAAA TGACGGAATC 

601 GCCCATCATA CTGCTCCTCA ACGTGTACGG TATCTGTTTG CACCTTACCG 

35 651 CGGTTTTCTA CCTCCGGCAT CCGATTCGGA TTTGAAAAGT TCCAAATATT 

7 01 CGGAATAG 

This encodes a protein having amino acid sequence <SEQ ID 556>: 

1 MMKRR IAVFV LLMQKIRILG QL LPKIVNTV PAHRMLFQIF GMFFFFIHRQ 

51 YLPGIAEIDS PGGIVFGTLL FRHLSAHCLY GKAAVG D AVA HEHPVADVAN 

40 101 RNANAFALFD IGQSAGFIVQ HTVNIKTVKI NIVDPHMFAN FAVFAVLEKR 

151 DFDHGKIQGG NNAAAFPKKL APKVFECFT G AFAGTVYRFV CLFYII NDGI 

2 01 AHHTAPQRVR YLFAPYRGFL PPA3DSDLKS SKYSE* 

ORF136ng and ORF136-1 show 93.6% identity in 235 aa overlap: 

orfl36ng MMKRRIAVFVLLMQKIRILGQLLPKIVNTVPAHRMLFQIFGMFFFFIHRQYLPGIAEIDS 
45 II II I I I I I I I : I I I : I I I I I I I I I I I I I I M M I I I I I I I II M I : I I I I I I M I I I 

orf 136-1 MMKRRIAVFVLFPQIIRVLGQLLPKIVNTVPAHRMLFQIFGMFFFFIHQQYLPGIAEIDS 

orfl36ng PGGIVFGTLLFRHLSAHCLYGKAAVGDAVAHEHPVADVANRNANAFALFDIGQSAGFIVQ 
I | | || I : H I I I I II I I I II I I I I I II I I I I I I I I I : I I I I II II I I I I I I I I I I I I 
50 or f 1 3 6-1 PCGIVFGALLFRHLPAHCLYGKAAVGDAVAHEHPVADVWRNANAFALFDIGQFAGFIVQ 

orfl3 6ng HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKVFECFTG 
I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 1 I I I I I : I I I I I I 
orf 13 6-1 HTVNIKTVKINIVDPHMFANFAVFAVLEKRDFDHGKIQGGNNAAAFPKKLAPKIFECFTG 

55 

orfl3 6ng AFAGTVYRFVCLFYIINDGIAHHTAPQRVRYLFAPYRGFLPPASDSDLKSSKYSEX 
I I : I I II I I II I I I 1 I I I I I I II : I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 
orf 13 6-1 AFVGTVYRFVCLFYIINDGIAHHSAPQRVRYLFAPYCGFLPSASDSDLKSSKYSEX 
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Based on the presence of the putative transmembrane domains in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N .gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 67 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 557>: 

1 ATGGAAAATA TGGTAACGTT TTCAAAAATC AGACCGCTTT TGGCAATCGC 

51 CGCCGCCGCG TTGCTTGCCG CC . TGCGGAC GGCGGGAAAT AATGCTGTCC 

101 GCAAGCCGGT GCAAACCGCC AAACCCGCCG CAGTGGTCGG TTTGGCACTC 

151 GGTGGCGGCG CATCTAAAGG ATTTGCCCAT GTAGGTATTA TTAAGGTTTT 

201 GAAAGAAAAC GGTATTCCTG TGAAGGTGGT TACCGGCACC TCCGCAGGTT 

251 CGATTGTCGG CAACCTTTTT GCATCGGGTA TGTCGCCCGA CCGCCTCGAA 

301 TTGGAAGCCG AAATTTTAGG CAAAACCGAT TTGGTCGATT TAACCTTGTC 

351 CACCAATGGG TTTATCAAAG GCGCAAAGCT GCAAAATTAC ATCAACCGAA 

4 01 AACTCCGCGG CATGCAGATT CAGCAGTTTC CCATCAAATT TGCCGCC. . 

This corresponds to the amino acid sequence <SEQ ID 558; ORF137>: 

1 MENMVTFSKI RPLLAIAAAA LLAAXRTAGN NAVRKPVQTA KPAAWGLAL 

51 GGGASKGFAH VGIIKVLKEN GIPVKVVTGT SAGSIVGNLF ASGMSPDRLE 

101 LEAEILGKTD LVDLTLSTNG FIKGAKLQNY INRKLRGMQI QQFPIKFAA. . 

Further work revealed the complete nucleotide sequence <SEQ ID 559>: 

1 ATGGAAAATA TGGTAACGTT TTCAAAAATC AGACCGCTTT TGGCAATCGC 

51 CGCCGCCGCG TTGCTTGCCG CCTGCGGCAC GGCGGGAAAT AATGCTGTCC 

101 GCAAGCCGGT GCAAACCGCC AAACCCGCCG CAGTGGTCGG TTTGGCACTC 

151 GGTGGCGGCG CATCTAAAGG ATTTGCCCAT GTAGGTATTA TTAAGGTTTT 

201 GAAAGAAAAC GGTATTCCTG TGAAGGTGGT TACCGGCACA TCGGCAGGTT 

2 51 CGATTGTCGG CAGCCTTTTT GCATCGGGTA TGTCGCCCGA CCGCCTCGAA 

301 TTGGAAGCCG AAATTTTAGG CAAAACCGAT TTGGTCGATT TAACCTTGTC 

351 CACCAGTGGT TTTATCAAAG GCGAAAAGCT GCAAAATTAC ATCAACCGAA 

4 01 AAGTCGGCGG CAGGCAGATT CAGCAGTTTC CCATCAAATT TGCCGCCGTT 

451 GCTACTGATT TTGAAACCGG CAAGGCCGTC GCTTTCAATC AGGGGAATGC 

501 CGGGCAGGCT GTGCGCGCTT CCGCCGCCAT TCCCAATGTG TTCCAACCCG 

551 TTATCATCGG CAGGCATACA TATGTTGACG GCGGTCTGTC GCAGCCCGTG 

601 CCCGTCAGTG CCGCCCGGCG GCAGGGGGCG AATTTCGTGA TTGCCGTCGA 

651 TATTTCCGCC CGTCCGGGCA AAAAC AT CAG CCAAGGTTTC TTCTCTTATC 

7 01 TCGATCAGAC GCTGAACGTA ATGAGCGTTT CTGCGTTGCA AAATGAGTTG 

751 GGGCAGGCGG ATGTGGTTAT CAAACCGCAG GTTTTGGATT TGGGTGCAGT 

801 CGGCGGATTC GATCAGAAAA AACGCGCCAT CCGGTTGGGT GAGGAGGCAG 

851 CACGTGCCGC ATTGCCTGAA ATCAAACGCA AACTGGCGGC ATACCGTTAT 

901 TGA 

This corresponds to the amino acid sequence <SEQ ID 560; ORF137-l>: 

1 MENMVTFSKI RPLLAIAAAA LLAA CGTAGN NAVRKPVQTA KPAAWGLAL 

51 GGGASKGFAH VGIIKVLKEN GIPVKVVTGT SAGSIVGSLF ASGMSPDRLE 

101 LEAEILGKTD LVDLTLSTSG FIKGEKLQNY INRKVGGRQI QQFPIKFAAV 

151 ATDFETGKAV AFNQGNAGQA VRASAAIPNV FQPVIIGRHT YVDGGLSQPV 

201 PVSAARRQGA NFVIAVDISA RPGKNTSQGF FSYLDQTLNV MSVSALQNEL 

251 GQADWIKPQ VLDLGAVGGF DQKKRAIRLG EEAARAALPE IKRKLAAYRY 

301 * 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF137 shows 93.3% identity over a 149aa overlap with an ORF (ORF137a) from strain A of N. 
meningitidis: 
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MENMVTFSKIRPLLAIAAAALLAAXRTAGNNAVRKPVQTAKPAAWGLALGGGASKGFAH 
I I I I I I I I I I I i I I I I II ill I I I I II II I : I M I I I I I I I I I I I I I I I 1 I I II II I I 
MENMVTFSKIRPLLAIAAAALLAACGTAGNNAARKPVQTAKPAAWGLALGGGASKGFAH 



10 



20 



30 



40 



50 



60 



70 80 90 100 110 120 

VGIIKVLKENGIPVKVVTGTSAGSIVGNLFASGMSPDRLELEAEILGKTDLVDLTLSTNG 
1 I I I I I II I I I I I I I I I i II I 1 1 I I I I : II I II 1 I I I I I I I I I I II 1 I I 1 I I II I I II : I 
VGIIKVLKENGIPVKWTGTSAGSIVGSLFASGMSPDRLELEAEILGKTDLVDLTLSTSG 

70 80 90 100 110 120 

130 140 149 

FIKGAKLQNYINRKLRGMQIQQFPIKFAA 



The complete length ORF137a nucleotide sequence <SEQ ID 561> is: 



101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



ATGGAAAATA 
CGCCGCCGCG 
GCAAGCCGGT 
GGTGGCGGCG 
GAAAGAAAAC 
CGATAGTCGG 
TTGGAAGCCG 
CACCAGTGGT 
AAGTCGGCGG 
GCTACTGATT 
CGGGCAGGCT 
TTATCATCGG 
CCCGTCAGTG 
TATTTCCGCC 
TCGATCAGAC 
GGGCAGGCGG 
CGGCGGATTC 
CACGTGCCGC 
TGA 



TGGTAACGTT 
TTGCTTGCCG 
GCAAACCGCC 
CATCTAAAGG 
GGTATTCCTG 
CAGCCTTTTT 
AAATTTTAGG 
TTTATCAAAG 
CAGGCGGATT 
TTGAAACCGG 
GTGCGCGCTT 
CAGGCATACA 
CCGCCCGGCG 
CGTCCGAGCA 
GCTGAACGTA 
ATGTGGTTAT 
GAT C AG AAAA 
ATTGCCTGAA 



TTCAAAAATC 
CCTGCGGCAC 
AAACCCGCCG 
ATTTGCCCAT 
TGAAGGTGGT 
GCATCGGGTA 
TAAAACCGAT 
GCGAAAAGCT 
CAGCAGTTTC 
CAAGGCCGTC 
CCGCCGCCAT 
TATGTTGACG 
GCANGNNNNG 
AAAAC AT C AG 
ATGAGCGTTT 
CAAACCGCAG 
AACGCGCCAT 
ATCAAACGCA 



AGACCGCTTT 
GGCGGGAAAT 
CAGTGGTCGG 
GTAGGTATTA 
TACCGGCACA 
TGTCGCCCGA 
TTGGTCGATT 
GCAAAATTAC 
CCATCAAATT 
GCTTTCAATC 
TCCCAATGTG 
GCGGTCTGTC 
NATNTCGTGA 
CCAAGGCTTC 
CCGCGTTGCA 
GTTTTGGATT 
CCGGTTGGGT 
AACTGGCGGC 



TGGCAATCGC 
AATGCTGCCC 
TTTGGCACTC 
TTAAGGTTTT 
TCGGCAGGTT 
CCGCCTCGAA 
TAACCTTGTC 
ATCAACCGAA 
TGCCGCCGTT 
AAGGGAATGC 
TTCCAACCCG 
GCAGCCCGTG 
TTGCCGTCGA 
TTCTCTTATC 
AAATGAGTTG 
TGGGTGCAGT 
GAGGAGGCAG 
ATACCGTTAT 



This encodes a protein having amino acid sequence <SEQ ID 562>: 



101 
151 
201 
251 
301 



MENMVTFSKI RPLLAIAAAA LLAA CGTAGN NAARKPVQTA KPAAWGLAL 
GGGASKGFAH VGIIKVLKEN GIPVKWTGT SAGSIVGSLF ASGMSPDRLE 
LEAEILGKTD LVDLTLSTSG FIKGEKLQNY INRKVGGRR1 QQFPIKFAAV 
ATDFETGKAV AFNQGNAGQA VRASAAIPNV FQPVIIGRHT YVDGGLSQPV 
PVSAARRXXX XXVIAVDISA RPSKNISQGF FSYLDQTLNV MSVSALQNEL 
GQADVVIKPQ VLDLGAVGGF DQKKRAIRLG EEAARAALPE IKRKLAAYRY 



ORF137a and ORF137-1 show 97.3% identity in 300 aa overlap: 



50 



orf 137a . pep MENMVTFSKIRPLLAIAAAALLAACGTAGNNAARKPVQTAKPAAWGLALGGGASKGFAH 
I N I ! I I I I I I I I I I I I I I I I I I I I I I I j | | | : | | | I I I I I I I I I I I I I I I I I I I I I i 1 I 
orf 1 3 7 - 1 MENMVTFSKIRPLLAIAAAALLAACGTAGNNAVRKPVQTAKPAAWGLALGGGASKGFAH 

orf 137a. pep VGIIKVLKENGIPVKWTGTSAGSIVGSLFASGMSPDRLELEAEILGKTDLVDLTLSTSG 
I I I I I i I I I M I I M I M I I I I M M I I M I f I I I I II I I I II M II II II II II I I I M 

orf 137-1 VGIIKVLKENGIPVKWTGTSAGSIVGSLFASGMSPDRLELEAEILGKTDLVDLTLSTSG 

orf 137a . pep FIKGEKLQNYINRKVGGRRIQQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 
M I I I I I I I I I I I I I I I I : I I I I I II I I I I I I I M I I I I I I I I II I I M I I | | I | || | | | 
orf 137-1 FIKGEKLQNYINRKVGGRQIQQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 

orf 137a . pep FQPVIIGRHTYVDGGLSQPVPVSAARRXXXXXVIAVDISARPSKNISQGFFSYLDQTLNV 

, 11 illl II I II II II II III 1111111:11111 || 

orf 137-1 FQPVIIGRHTYVDGGLSQPVPVSAARRQGANFVIAVDISARPGKNISQG FFSYLDQTLNV 

orf 137a . pep MSVSALQNELGQADWIKPQVLDLGAVGGFDQKKRAIRLGEEAARAALPE IKRKLAAYRY 
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I I M I 1 I I M I M M I I I I I I U I I I I I I I I I I I 1 II I I I I I I I I I I I I I I I I I I I I I I I 
or f 137 -1 MSVSALQNELGQADWIKPQVLDLGAVGGFDQKKRAIRLGEEAARAALPEIKRKLAAYRY 

Homology with a predicted ORF from N. gonorrhoeae 
5 ORF137 shows 89.9% identity over a 149aa overlap with a predicted ORF (ORF137ng) from 
N. gonorrhoeae: 

or f 137 pep MENMVT F SK I RPLLAI AAAALLAAXRT AGNNAVRKPVQT AKPAAWGLALGGGAS KG FAH 60 

| | | | M I I I I I : I I I I M I I I I I I I I I I I : 1 1 1 I I I I I I M I I : I II I I 1 11 

orfl37ng ME NMVTFSKIRSFLAIAAAALLAACGTAGNNAARKPVQTAKPAAWALALGGGASKGFAH 60 

^ orfl37 pep VGIIKVLKENGIPVKWTGTSAGSIVGNLFASGMSPDRLELEAEILGKTDLVDLTLSTNG 120 

: [ | : | [ I I I I I I I M I I I I I I M I I II : I : I I I I M I I I I I 1 1 1 I I I I I I I I 1 I I I I I = I 
orfl37ng IGIVKVLKENGIPVKWTGTSAGSIVGSLLASGMSPDRLELEAEILGKTDLVDLTLSTSG 120 

15 orfl37.pep FI KGAKLQNY I NRKLRGMQ I QQFP I KFAA 149 

MM I I II I I I I I : I I I I I I I I I I I I 
orf 137ng FIKGEKLQNYINRKVGGRQIQQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 180 

The complete length ORF137ng nucleotide sequence <SEQ ID 563> is: 

1 ATGGAAAATA TGGTAACGTT TTCAAAAATC AGATCATTTT TGGCAATCGC 

20 51 CGCCGCCGCG TTGCTTGCCG CCTGCGGTAC GGCGGGAAAC AATGCCGCCC 

101 GCAAGCCGGT GCAAACCGCC AAACCCGCCG CAGTGGTCGC TTTGGCACTC 

151 GGTGGCGGCG CAT CT AAAGG ATTTGCCCAT ATAGGAATTG TTAAGGTTTT 

201 GAAAGAAAAC GGTATTCCTG TGAAGGTGGT TACCGGCACA TCGGCAGGTT 

251 CGATAGTCGG CAGCCTTTTG GCATCGGGTA TGTCGCCCGA CCGCCTCGAA 

25 301 TTGGAAGCCG AGATTTTAGG TAAAAC CGAT TTAGTCGATT TAACCTTGTC 

351 CACCAGTGGT TTTATCAAAG GCGAAAAGCT GCAAAATTAC ATCAACCGAA 

401 AAGTCGGCGG CAGGCAGATT CAGCAGTTTC CCATCAAATT TGCCGCCGTT 

451 GCCACTGATT TTGAAACCGG CAAGGCCGTC GCTTTCAATC AAGGGAATGC 

501 CGGGCAGGCG GTTCGTGCTT CCGCCGCCAT TCCCAATGTG TTCCAGCCAG 

30 551 TCATCATCGG CAGGCACAAA TATGTTGACG GCGGTCTGTC GCAGCCCGTG 

601 CCCGTCAGTG CCGCTCGGCG GCAGGGGGCG AATTTCGTGA TTGCCGTCGA 

651 TATTTCCGCA CGTCCGAGCA AAAATGTCGG TCAAGGTTTC TTCTCTTATC 

701 TCGATCAGAC GCTGAACGTG ATGAGCGTTT CCGTGTTGCA AAACGAGTTG 

751 gggcAGGCGG ATGTGGTTAT CAAACCGCag gtTTTGGATT TGGGTGCAGT 

35 6 01 CGGCGGATTC GAT C AG AAAA AGCGCGCCAT CCGGTTGGGC GAGGAGGCAG 

851 CACGTGCCGC ATTGCCTGAA ATCAAACGCA AACTGGCGGC ATACCGTTAT 

901 TGA 

This encodes a protein having amino acid sequence <SEQ ID 564>: 

1 MENMVTFSK I RSFLAIAAAA LLAAC GTAGN NAARKPVQTA KPAAWALAL 

40 51 GGGAS KG FAH IGIVKVLKEN GIPVKWTGT SAGSIVGSLL ASGMSPDRLE 

101 LEAEILGKTD LVDLTLSTSG FIKGEKLQNY INRKVGGRQI QQFPIKFAAV 

151 ATDFETGKAV AFNQGNAGQA VRASAAIPNV FQPVIIGRHK YVDGGLSQPV 

2 01 PVSAARRQGA NFVIAVDISA RPSKNVGQGF FSYLDQTLNV MSVSVLQNEL 

2 51 GQADWIKPQ VLDLGAVGGF DQKKRAIRLG EEAARAALPE IKRKLAAYRY 

45 301 * 

ORF137ng and ORF137-1 show 96.0% identity in 300 aa overlap: 

orfl37ng MENMVT FSKI RS FLAI AAAALLAACGTAGNNAARKPVQT AKPAAWALALGGGASKGFAH 

I I I I M M M I : I I I II I I I I I I I 1 I 1 I I II : I I I I I I I I I I I I I : I I I I I II I II I I I 

orf 137-1 MENMVTFSKIRPLLAIAAAALLAACGTAGNNAVRKPVQTAKPAAWGLALGGGASKGFAH 

50 

orf 137ng IGIVKVLKENGIPVKWTGTSAGSIVGSLLASGMSPDRLELEAEILGKTDLVDLTLSTSG 
: I I : I II M M I M M I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 137-1 VGIIKVLKENGIPVKWTGTSAGSIVGSLFASGMSPDRLELEAEILGKTDLVDLTLSTSG 

55 orfl37ng FIKGEKLQNYINRKVGGRQIQQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 

I I I II I I II II I I I I II I I M I I I I I I I II I I I I I I I [ M I 1 M I I I I I I I II I I M I I I 

orf 137-1 FIKGEKLQNYINRKVGGRQIQQFPIKFAAVATDFETGKAVAFNQGNAGQAVRASAAIPNV 
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orf 137-1 FQPVIIGRHTYVDGGLSQPVPVSAARRQGANFVIAVDISARPGKNISQGFFSYLDQTLNV 

orfl37ng MSVSVLQNELGQADVVIKPQVLDLGAVGGFDQKKRAIRLGEEAARAALPEIKRKLAAYRY 
I I I I : I I I I I I I I I I I I I 1 I I I I I 1 I I I I I 1 1 I I II I I I I I I I I II I I I I I I I 1 ! I M I I 
orf 137 MSVSALQNELGQADWIKPQVLDLGAVGGFDQKKRAIRLGEEAARAALPEIKRKLAAYRY 

Based on the presence of a predicted prokaryotic membrane lipoprotein lipid attachment site 
(underlined) in the gonococcal protein, it is predicted that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



Example 68 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 565>: 

1 ATGTTTCGTT TACAATTCAG GCTGTTTCCC CCTTTGCGAA CCGCCATGCA 

51 CATCCTGTTG ACCGCCCTGC TCAAATGCCT CTCCCTGcTG CCGCTTTCCT 

101 GTCTGCACAC GCTGGGAAAC CGGCTCGGAC ATCTGGCGTT TTACCTTTTA 

151 AAGGAAGACC GCGCGCGCAT CGTCGCCmAT ATGCGGCAGG CGGGTTTGAA 

201 CCCCGACCCC AAAACGGTCA AAGCCGTTTT TGCGGAAACG GCAAAAGGCG 

251 GTTTGGAACT TGCCCCCGCG TTTTTCAGAA AACCGGAAGA CATAGAAACA 

301 ATGTTCAAAG CGGTACACGG CTGGGAACAT GTGCAGCAGG CTTTGGACAA 

351 ACACGAAGGG CTGCTATTC. . 

This corresponds to the amino acid sequence <SEQ ID 566; ORF138>: 

1 MFRLQFRLFP PLRTAMHILL TALLKCLSLL PLSCLHTLGN RLGHLAFYLL 

51 KEDRARIVAX MRQAGLNPDP KTVKAVFAET AKGGLELAPA FFRKPEDIET 

101 MFKAVHGWEH VQQALDKHEG LLF 

Further work revealed the complete nucleotide sequence <SEQ ID 567>: 

1 ATGTTTCGTT TACAATTCAG GCTGTTTCCC CCTTTGCGAA CCGCCATGCA 

51 CATCCTGTTG ACCGCCCTGC TCAAATGCCT CTCCCTGCTG CCGCTTTCCT 

101 GTCTGCACAC GCTGGGAAAC CGGCTCGGAC ATCTGGCGTT TTACCTTTTA 

151 AAGGAAGACC GCGCGCGCAT CGTCGCCAAT ATGCGGCAGG CGGGTTTGAA 

201 CCCCGACCCC AAAACGGTCA AAGCCGTTTT TGCGGAAACG GCAAAAGGCG 

251 GTTTGGAACT TGCCCCCGCG TTTTTCAGAA AACCGGAAGA CATAGAAACA 

301 ATGTTCAAAG CGGTACACGG CTGGGAACAT GTGCAGCAGG CTTTGGACAA 

351 ACACGAAGGG CTGCTATTCA TCACGCCGCA CATCGGCAGC TACGATTTGG 

401 GCGGACGCTA CATCAGCCAG CAGCTTCCGT TCCCGCTGAC CGCCATGTAC 

451 AAACCGCCGA AAATCAAAGC GATAGACAAA ATCATGCAGG CGGGCAGGGT 

501 TCGCGGCAAA GGAAAAACCG CGCCTACCAG CATACAAGGG GTCAAACAAA 

551 TCATCAAAGC CCTGCGTTCG GGCGAAGCAA CCATCGTCCT GCCCGACCAC 

601 GTCCCCTCCC CTCAAGAAGG CGGGGAAGGC GTATGGGTGG ATTTCTTCGG 

651 CAAACCTGCC TATACCATGA CGCTGGCGGC AAAATTGGCA CACGTCAAAG 

7 01 GCGTGAAAAC CCTGTTTTTC TGCTGCGAAC GCCTGCCTGG CGGACAAGGT 

7 51 TTCGATTTGC ACATCCGCCC CGTCCAAGGG GAATTGAACG GCGACAAAGC 

8 01 CCATGATGCC GCCGTGTTCA ACCGCAATGC CGAATATTGG ATACGCCGTT 
851 TTCCGACGCA GTATCTGTTT ATGTACAACC GCTACAAAAT GCCGTAA 

This corresponds to the amino acid sequence <SEQ ID 568; ORF138-l>: 

1 MFRLQFRLFP PLRTAMH ILL TALLKCLSLL PLSC LHTLGN RLGHLAFYLL 

51 KEDRARIVAN MRQAGLNPDP KTVKAVFAET AKGGLELAPA FFRKPEDIET 

101 MFKAVHGWEH VQQALDKHEG LLFITPHIGS YDLGGRYISQ QLPFPLTAMY 

151 KPPKIKAIDK IMQAGRVRGK GKTAPTSIQG VKQIIKALRS GEATIVLPDH 

201 VPSPQEGGEG VWVDFFGKPA YTMTLAAKLA HVKGVKTLFF CCERLPGGQG 

251 FDLHIRPVQG ELNGDKAHDA AVFNRNAEYW IRRFPTQYLF MYNRYKMP* 



Computer analysis of this amino acid sequence gave the following results: 
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Homology with a predicted QRF from N. meningitidis (strain A) 

ORP138 shows 99.2% identity over a 123aa overlap with an ORF (ORF138a) from strain A oiN. 
meningitidis: 

10 20 30 40 50 60 

orfl38 Deo MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAX 

| | | | | | M I I I I I I I I II I I I I i I I I I I 1 M I I I I I I I I I I I I I I 1 1 

orfl38a MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAN 



10 



20 



30 



40 



50 



70 80 90 100 110 120 

MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 

| | | | | | | | | | || I || I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 

MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 

70 80 90 100 110 120 



orfl38.pep 
orfl38a 



LLFITPHIGSYDLGGRYISQQLPFPLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTSIQG 
130 140 150 160 170 180 



The complete length ORF138a nucleotide sequence <SEQ ID 569> is: 



101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 



ATGTTTCGTT 
CATCCTGTTG 
GTCTGCACAC 
AAGGAAGACC 
TCCCGACCCC 
GTTTGGAACT 
ATGTT CAAAG 
ACACGAAGGG 
GCGGACGCTA 
AAACCGCCGA 
TCGCGGCAAA 
TCATCAAAGC 
GTCCCCTCCC 
CAAACCTGCC 
GCGTGAAAAC 
TTCGATTTGC 
CCATGATGCC 
TTCCGACGCA 



TACAATTCAG 
ACCGCCCTGC 
GCTGGGAAAC 
GCGCGCGCAT 
AAAACGGTCA 
TGCCCCCGCG 
CGGTACACGG 
CTGCTATTCA 
CATCAGCCAG 
AAATCAAAGC 
GGAAAAACCG 
CCTGCGTTCG 
CTCAAGAAGG 
TATACCATGA 
CCTGTTTTTC 
ACATCCGCCC 
GCCGTGTTCA 
GTATCTGTTT 



GCTGTTTCCC 
TCAAATGCCT 
CGGCTCGGAC 
CGTCGCCAAT 
AAGCCGTTTT 
TTTTTCAGAA 
CTGGGAACAT 
TCACGCCGCA 
CAGCTTCCGT 
GATAGACAAA 
CGCCTACCAG 
GGCGAAGCAA 
CGGGGAAGGC 
CGCTGGCGGC 
TGCTGCGAAC 
CGTCCAAGGG 
ACCGCAATGC 
ATGTACAACC 



CCTTTGCGAA 
CTCCCTGCTG 
ATCTGGCGTT 
ATGCGTCAGG 
TGCGGAAACG 
AACCGGAAGA 
GTGCAGCAGG 
CATCGGCAGC 
TCCCGCTGAC 
ATCATGCAGG 
CATACAAGGG 
CCATCGTCCT 
GTATGGGTGG 
AAAATTGGCA 
GCCTGCCTGG 
GAATTGAACG 
CGAATATTGG 
GCTACAAAAT 



CCGCCATGCA 
CCGCTTTCCT 
TTACCTTTTA 
CAGGCAT GAA 
GCAAAAGGCG 
CATAGAAACA 
CTTTGGACAA 
TACGATTTGG 
CGCCATGTAC 
CGGGCAGGGT 
GT CAAACAAA 
GCCCGACCAC 
ATTTCTTCGG 
CACGTCAAAG 
CGGACAAGGT 
GCGACAAAGC 
ATACGCCGTT 
GCCGTAA 



This encodes a protein having amino acid sequence <SEQ ID 570>: 

1 MFRLQFRLFP PLRTAMH ILL TALLKCLSLL PLS CLHTLGN RLGHLAFYLL 

51 KEDRARIVAN MRQAGLNPDP KTVKAVFAET AKGGLELAPA FFRKPEDIET 

101 MFKAVHGWEH VQQALDKHEG LLFITPHIGS YDLGGRYISQ QLPFPLTAMY 

151 KPPKIKAIDK IMQAGRVRGK GKTAPTSIQG VKQI IKALRS GEATIVLPDH 

201 VPSPQEGGEG VWVDFFGKPA YTMTLAAKLA HVKGVKTLFF CCERLPGGQG 

251 FDLHIRPVQG ELNGDKAHDA AVFNRNAEYW IRRFPTQYLF MYNRYKMP* 

ORF138a and ORF138-1 show 99.7% identity over a 298aa overlap: 

orf 138a . pep MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAN 

I I I I I Ill I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I 

orf 138-1 MFRLQFRLFP PLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARI VAN 

orf 138a . pep MRQAGMNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 
I I II I : I II I I I I I I I I i I I I I I I I I I I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I 
orf 138-1 MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 

orf 138a. pep LLFITPHIGSYDLGGRYISQQLPFPLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTSIQG 
I I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I 
orf 138-1 LL FI TPHIGSYDLGGRYI S QQLP FPLTAMYKPPKIKAI DKIMQAGRVRGKGKTAPT S I QG 

orf 138a . pep VKQIIKALRSGEATIVLPDHVPSPQEGGEGVWVDFFGKPAYTMTLAAKLAHVKGVKTLFF 
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I I I I I I I [ I I I I I I I I I I I I I I I I I I I I I I I I M I I I M M I I M I M I I I I I I I I I I I I 

orf 138-1 VKQIIKALRSGEATIVLPDHVPSPQEGGEGWVDFFGKPAYTMTLAAKLAHVKGVKTLFF 
CCERLPGGQGFDLHIRPVQGELNGDKAHDAAVFNRNAEYWIRRFPTQYLFMYNRYKMP 

| | | | | ] | || | | | I I I M I II I I I I I I I I I M I I I I I I M II I II I I M I M I I I I I I I 
CCERLPGGQGFDLHIRPVQGELNGDKAHDAAVFNRNAEYWIRRFPTQYLFMYNRYKMP 



orfl38a.pep 
orfl38-l 



Homology with a predicted ORF from N. gonorrhoeae 

ORF138 shows 94.3% identity over a 123aa overlap with a predicted ORF (ORF138ng) from 
10 N. gonorrhoeae: 

orf 13 8 pep MFRLQFRLFPPLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAX 60 

| | | | | | M I I II I I I I I I I I I I M I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
orfl38ng MFRLQFRLFPPLRTAMHILLTALLKCLSLLSLSCLHTLGNRLGHLAFYLLKEDRARIVAN 60 

15 orfl38 pep MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 120 

Ill : I II I I I I I I I I 1 M I : I I I I I I I I M II II I I 

orfl38ng MRQAGLNPDTQTVKAVFAETAKCGLELAPAFFKKPEDIETMFKAVHGWEHVQQALDKGEG 12 0 

orf 138. pep LLF 123 
orfl3 8ng LLFITPHIGSYDLGGRYISQQLPFHLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTGIQG 180 

The complete length ORF138ng nucleotide sequence <SEQ ID 57 1> is: 

1 ATGTTTCGTT TACAATTCAG GCTGTTTCCC CCTTTGCGAA CCGCCATGCA 

51 CATCCTGTTG ACCGCCCTGC TCAAATGCCT CTCCCTGCTG TCGCTTTCCT 

25 101 GTCTGCACAC GCTGGGAAAC CGGCTCGGAC ATCTGGCGTT TTACCTTTTA 

151 AAGGAAGACC GCGCGCGCAT CGTCGCCAAT ATGCGGCAGG CGGGTTTGAA 

201 CCCCGACACG CAGACGGTCA AAGCCGTTTT TGCGGAAACG GCAAAATGCG 

251 GTTTGGAACT TGCCCCCGCG TTTTTCAAAA AACCGGAAGA CATCGAAACA 

301 ATGTTCAAAG CGGTACACGG CTGGGAACAC GTGCAGCAGG CTTTGGACAA 

30 351 GGGCGAAGGG CTGCTGTTCA TCACGCCGCA CATCGGCAGC TACGATTTGG 

401 GCGGACGCTA CATCAGCCAG CAGCTTCCGT TCCACCTGAC CGCCATGTAC 

451 AAGCCGCCGA AAATCAAAGC GATAGACAAA ATCATGCAGG CGGGCAGGGT 

501 GCGCGGCAAA GGCAAAACcg cgcccaccgg catACAAGGG GT C AAAC AAA 

551 tcatcaAGGC CCTGCGCGCG GGCGAGGCAA CCAtcATCCT GCCCGACCAC 

35 601 GTCCCTTCTC CGCAGGAagg cggCGGCGTG TGGGCGGATT TTTTCGGCAA 

651 ACCTGCATAc acCATGACAC TGGCGGCAAA AT T GGC AC AC GTCAAAGGCG 

7 01 TGAAAACCCT GTTTTTCTGC TGCGAACGCC TGCCCGACGG ACAAGGCTTC 

7 51 GTGTTGCACA TCCGCCCCGT CCAAGGGGAA TTGAACGGCA ACAAAGCCCA 

8 01 CGATGCCGCC GTGTTCAACC GCAATACCGA ATATTGGATA CGCCGTTTTC 
40 851 CGACGCAGTA TCTGTTTATG TACAACCGCT ATAAAACGCC GTAA 

This encodes a protein having amino acid sequence <SEQ ID 572>: 

1 MFRLQFRLFP PLRTAMH ILL TALLKCLSLL SLSC LHTLGN RLGHLAFYLL 

51 KE DRAR IVAN MRQAGLNPDT QTVKAVFAET AKCGLELAPA FFKKPEDIET 

101 MFKAVHGWEH VQQALDKGEG LLFITPHIGS YDLGGRYISQ QLPFHLTAMY 

45 151 KPPKIKAIDK IMQAGRVRGK GKTAPTGIQG VKQIIKALRA GEATIILPDH 

201 VPSPQEGGGV WADFFGKPAY TMTLAAKLAH VKGVKTLFFC CERLPDGQGF 

251 VLHIRPVQGE LNGNKAHDAA VFNRNTEYWI RRFPTQYLFM YNRYKTP* 

ORF138ng and ORF138-1 show 94.3% identity over 299aa overlap: 

orf 138-1 . pep MFRLQFRLFP PLRTAMHILLTALLKCLSLLPLSCLHTLGNRLGHLAFYLLKEDRARIVAN 
50 I I I I I I I I I I I 1 I I I I I I I I M-l I I I II I I I 1 I I I I I I II II II I I I I I I I I I I I I I M 

orfl38ng MFRLQFRLFP PLRTAMHILLTALLKCLSLLSLSCLHTLGNRLGHLAFYLLKEDRARI VAN 

orf 138-1 . pep MRQAGLNPDPKTVKAVFAETAKGGLELAPAFFRKPEDIETMFKAVHGWEHVQQALDKHEG 
I II I II I II : I I I I I I! I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I II 
55 orfl38ng MRQAGLNPDTQTVKAVFAETAKCGLELAPAFFKKPEDIETMFKAVHGWEHVQQALDKGEG 

orf 138-1. pep LLFITPHIGSYDLGGRYISQQLPFPLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTSIQG 
I I I I I I I I I I I I I I I I I I I I II I I II I I I I I I II I I I I I I I II I I I I I I I I II I I : I I I 
orfl38ng LLFITPHIGSYDLGGRYISQQLPFHLTAMYKPPKIKAIDKIMQAGRVRGKGKTAPTGIQG 
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orf 138-1 cep VKQIIKALRSGEATIVLPDHVPSPQEGGEGVWVDFFGKPAYTMTLAAKLAHVKGVKTLFF 
| | | | | | | | | : | | I 1 I : I 1 I I I I I I I I M U I : I I I M I 1 I I 11 I I ! 1 I 1 I 1 I 1 I I M I I 
orfl38ng VKQIIKALRAGEATIILPDHVPSPQEGG-GWADFFGKPAYTMTLAAKLAHVKGVKTLFF 

5 orfl38-l pep CCERLPGGQGFDLHIRPVQGELNGDKAHDAAVFNRNAEYWIRRFPTQYLFMYNRYKMP 

P P | | | | | | MM I I I I I I I I I I I I: I I I I I : I I I I I I 1 I 1 I II II I I M 1 I 

orf 13 8ng CCERLPDGQGFVLHIRPVQGELNGNKAHDAAVFNRNTEYWIRRFPTQYLFMYNRYKTP 

In addition, ORF138ng is homologous to htrB protein from Pseudomonas fluorescens: 

gnl|PID|e334283 (Y14568) htrB [Pseudomonas fluorescens] Length = 253 
10 Score =80.8 bits (196), Expect = 9e-15 

Identities = 49/151 (32%), Positives = 79/151 (51%), Gaps = 6/151 (3%) 

Query 101 MFKAVHGWEHVQQALDKGEGLLFITPHIGSYD-LGGRYISQQLPFHLTAMYKPPKIKAID 159 
+ + V G E +++AL G+G++ IT H+G+++ L Y SQ P Y+PPK+KA+D 
15 Sbjct: 94 LVREVEGLEVLKEALASGKGWGITSHLGNWEVLNHFYCSQCKPI IFYRPPKLKAVD 150 

Query: 160 KIMQAGRVRGKGKTAPTGIQGVKQIIKALRAGEATIILPDHVPSPQEGGGVWADFFGKPA 219 

++++ RV + K A + +G+ +IK +R G I D P P E G++ FF A 
Sbjct- 151 ELLRKQRVQLGNKVAASTKEGILSVIKEVRKGGQVGIPAD— PEPAESAGIFVPFFATQA 208 

20 

Query: 220 YTMTLAAKLAHVKGVKTLFFCCERLPDGQGF 250 

T + +F RLPDG G+ 

Sbjct: 209 LTSKFVPNMLAGGKAVGVFLHALRLPDGSGY 239 

Based on this analysis, including the presence of a putative transmembrane domain in the 
25 gonococcal protein, it was predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF138-1 (57kDa) was cloned in the pGex vectors and expressed in E.coli, as described above. 
The products of protein expression and purification were analyzed by SDS-PAGE. Figure 14A 
shows the results of affinity purification of the GST-fusion protein. Purified GST-fusion protein 
30 was used to immunise mice, whose sera were used for ELISA (positive result) and FACS analysis 
(Figure 14B). These experiments confirm that ORF138-1 is a surface-exposed protein, and that it 
is a useful immunogen. 

Example 69 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 573>: 

35 1 GCGTGGTCGG CCGGCGAATC GTGGCGTGTG TTAATGGAAA GTGAAACGTG 

51 GCATGCGGTG TGGAATACTT TGCGCTTCTC GGCGGCGGCG GTGTATGCGG 

101 CAGCGGTTTT GGGTGTGGTG TATGCGGCGC CGGCGCGGCG GTCGGCGTGG 

151 ATGCGCGGGC TGATGTTTTA GCCGTTTATG GTGTCGCCGG TTTGTGTTTC 

2 01 GGCGGGCGTG CTGCTGCTTT ATCCGCAGTG GACGGCTTCG TTGCCGTTGC 

40 251 TGCTGGCGAT GTATGCGCTG CTGGCGTATC CGTTTGTGGC AAAAGATGTT 

301 TTATCAGCCT GGGATGCACT GCCGCCGGAT TACGGCAGGG CGGCGGCGGG 

351 TTTGGGTGCA AACGGCTTTC AGACGGCATG CCGCATCACG TTCCCCCTCT 

4 01 TGAAACCGGC GTTGCGGCGC GGTCTGACTT TGGCGGCGGC AACCTGCGTG 

4 51 GGCGAATTTG CGGCGACATT GTTTCTGTCG CGTCCGGAAT GGCAGACGCT 

45 501 GACGACTTTG ATTTATGCCT ATTTGGGACG CGCGGGTGAG GATAATTACG 

551 CGCGGGCGAT GGTGCTG . . 

This corresponds to the amino acid sequence <SEQ ED 574; ORF139>: 



1 . .iWSAGESWRV LMESETWHAV WNTLRFSAAA VYAAAVLGVV YAAPARRSAW 
51 MRGLMFXPFM VSPVCVSAGV LLLYPQWTAS LPLLLAMYAL LAYPFVAKDV 
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101 LSAWDALPPD YGRAAAGLGA NGFQTACRIT FPLLKPALRR GLTLAAATCV 
151 GEFAATLFLS RPEWQTLTTL IYAYLGRAGE DNYARAMVL . . 

Further work revealed the complete nucleotide sequence <SEQ ID 575>: 

1 ATGGATGGAC GGCGTTGGGT GGTAT GGGGT GCTTTTGCCC TGCTGCCTTC 

5 51 GGCTTTTTTG GCGGTAATGG TCGTTGCGCC TTTGTGGGCG GTGGCGGCGT 

101 ATGACGGTTT GGCGTGGCGC GCGGTGCTGT CGGATGCCTA TATGCTCAAA 

151 CGTTTGGCGT GGACGGTATT TCAGGCAGCG GCAACCTGTG TGCTGGTGCT 

201 GCCTTTGGGC GTGCCTGTCG CGTGGGTGCT GGCGCGGCTG GCGTTTCCGG 

251 GGCGGGCTTT GGTGCTGCGC CTGCTGATGC TGCCTTTTGT GATGCCCACG 

10 301 TTGGTGGCGG GCGTGGGCGT GCTGGCCCTG TTCGGGGCGG ACGGGCTGTT 

351 GTGGCGCGGC AGGCAGGATA CGCCGTATCT GTTGTTGTAC GGCAATGTGT 

401 TTTTCAACCT TCCTGTGTTG GTCAGGGCGG CGTATCAGGG GTTTGTGCAA 

4 51 GTGCCTGCGG CACGGCTTCA GACGGCACGG ACGTTGGGCG CGGGGGCGTG 

501 GCGGCGGTTT TGGGACATTG AAATGCCCGT TTTGCGCCCG TGGCTTGCCG 

15 551 GCGGCGTGTG CCTTGTCTTT CTGTATTGTT TTTCCGGGTT CGGGCTGGCG 

601 CTGCTGCTGG GCGGCAGCCG TTATGCCACG GTCGAAGTGG AAATTTACCA 

651 GTTGGTCATG TTCGAACTCG ATATGGCGGT TGCTTCGGTG CTGGTGTGGC 

701 TGGTGTTGGG GGTAACGGCG GCGGCAGGGT TGCTGTATGC GTGGTTCGGC 

751 AGGCGCGCGG TTTCGGATAA GGCGGTTTCC CCTGTGATGC CGTCGCCGCC 

20 801 GCAGTCGGTC GGGGAATATG TGCTGCTGGC GTTTGCGGCG GCGGTGTTGT 

851 CTGTGTGCTG CCTGTTTCCT TTGTTGGCAA TTGTTGTGAA AGCGTGGTCG 

901 GCCGGCGAAT CGTGGCGTGT GTTAATGGAA AGTGAAACGT GGCAGGCGGT 

951 GTGGAATACT TTGCGCTTCT CGGCGGCGGC GGTGTATGCG GCGGCGGTTT 

10 01 TGGGTGTGGT GTATGCGGCG GCGGCGCGGC GGTCGGCGTG GATGCGCGGG 

25 1051 CTGATGTTTT TGCCGTTTAT GGTGTCGCCG GTTTGTGTTT CGGCGGGCGT 

1101 GCTGCTGCTT TATCCGCAGT GGACGGCTTC GTTGCCGTTG CTGCTGGCGA 

1151 TGTATGCGCT GCTGGCGTAT CCGTTTGTGG CAAAAGATGT TTTATCAGCC 

1201 TGGGATGCAC TGCCGCCGGA TTACGGCAGG GCGGCGGCGG GTTTGGGTGC 

1251 AAACGGCTTT CAGACGGCAT GCCGCATCAC GTTCCCCCTC TTGAAACCGG 

30 1301 CGTTGCGGCG CGGTCTGACT TTGGCGGCGG CAACCTGCGT GGGCGAATTT 

1351 GCGGCGACAT TGTTTCTGTC GCGTCCGGAA TGGCAGACGC TGACGACTTT 

14 01 GATTTATGCC TATTTGGGAC GCGCGGGTGA GGATAATTAC GCGCGGGCGA 

14 51 TGGTGCTGAC ATTGCTGTTG GCGGCGTTCG CGCTGGGTAT TTTCCTGCTG 

1501 TTGGACGGCG GCGAAGGCGG AAAACAGACG GAAACGT TAT AA 

35 This corresponds to the amino acid sequence <SEQ ID 576; ORF139-l>: 

1 MDGRRWWWG AFALLPSAFL AVMWAPLWA VAAYDGLAWR AVLSDAYMLK 

51 RLAWTVFQAA ATCVLVLPLG VPVAWV LARL AFPGRALVLR LLML PFVMPT 

101 LVAGVGVLAL FG ADGLLWRG RQDTPYLLLY GNVFFNLPVL VRAAYQGFVQ 

151 VPAARLQTAR TLGAGAWRRF WDIEMPVLRP WLAGG VCLVF LYCFSGFGLA 

40 201 LLLGGSRYAT VEVEIYQLVM FELDMAV ASV LVWLVLGVTA AAGLL YAWFG 

251 RRAVSDKAVS PVMPSPPQSV GEYVLLAFA A AVLSVCCLFP LLAIVV KiWS 

301 AGESWRVLME SETWQAVWNT LRFS AAAVYA AAVLGWYAA A ARRSAWMRG 

351 LM FLPFMVSP VCVSAGVLLL YPQWTAS LPL LLAMYALLAY PFVA KDVLSA 

401 WDALPPDYGR AAAGLGANGF QTACRITFPL LKPALRRGLT LAAATCVGEF 

45 451 AATLFLSRPE WQTLTTLIYA YLGRAGE DNY ARA MVLTLLL AAFALGIFLL 

501 LDGGEGGKQT ETL* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF139 shows 94.7% identity over a 189aa overlap with an ORF (ORF139a) from strain A ofN. 
50 meningitidis: 

10 20 30 

orf 139 .pep AWSAGESWRVLMESETWHAVWNTLRFSAAA 

I I I I I I ! I I I M I M I I : I I I I I MINI 
orf 13 9a QSVGEYVLLAFA AAVXSVCCLFXLLAIVV KAWSAGESWRVLMESETWQAVWMTXRFS AAA 
55 270 280 290 300 310 320 

40 50 60 70 80 90 

orf 139 .pep VYAAAVLGVVYAAPA RRSAWMRGLMF XPFMVSPVCVSAGVLLL YPQWTAS LPLLLAMYAL 
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nrf i uvaanvT.nwYflRAARKSAWMRGLMF LPFMVSPVCVSAGVLLL XPQWTASLPLLLAMYAL 
330 340" 350 360 370 380 

100 110 120 130 140 150 

orfl39 pep LAYPFVAKDVLSAWDALPPDYGRAAAGLGANGFQTACRITFPLLKPALRRGLTLAAATCV 

p p i inn 1 1 1 1 1 m 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 

orfl39a LAYPFVAKDVLSAXDALPPDYGRAAAGLGANGFQTACRITFPLLKPALRRGLTLAAATCV 
390 400 410 420 430 440 

160 170 180 189 

orfl39 pep GEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNYARAMVL 

I I I I I I I I I I III,' I I I I I I I I I 1 I I I 

or f 13 9a gf.faatt.FXHRXEWOTLTTLIYAYXGRAGXDNYARAM VLTLLLAAFALGXFLLL DGGEGG 

450 460 470 480 490 500 

The complete length ORF139a nucleotide sequence <SEQ ID 577> is: 

1 ATGGATGGAC GGCGTTGGGC GGTATGGGGT GCTTTTGCCC TGCTGCCTTC 

51 GGCTTTTTTG GCGGCAATGG TCGTTGCGCC TTTGTGGGCG GTGGCGGCGT 

101 ATGACGGTTT GGCGTGGCGC GCGGTGCTGT CGGATGCCTA TATGCTCAAA 

151 CGTTTGGCGT GGACGGTATT TCAGGCAGCG GCAACCTGTG TGCTGGTGCT 

201 GCCTTTGGGC GTGCCTGTCG CGTGGGTGCT GGCGCGGCTG GCGTTTCCGG 

251 GGCGGGCTTT GGTGCTGCGC CTGCTGATGC TGCCTTTTGT GATGCCCACG 

301 TTGGTGGCGG GCGTGGGCGT GCTGGCTCTG TTCGGGGCGG ACGGCCTGTN 

351 GTGGCGCGGC TGGCAGGATA CGCCGTATCT GTTGTTGTAC GGCAATGTGT 

4 01 TTTTTNACCT TCCTGTGTTG GTCAGGGCGG CATATCAGGG GTTTGTGCAA 

451 GTGCCTGCGG CACGGCTTCA GACGGCACNG ACATTGGGCG CGGGGGCGTG 

501 GCGGCGGTTT TGGGACATTG AAATGCCCGT TTTGCGCCCG TGGCTTGCCG 

551 GCGGCGTGTG CCTTGTCTTC CTGTATTGTT TTTCGGGGTT CGGGCTGGCA 

601 TTGCTGCTGG GCGGCAGCCG TTATGCCACG GTCGAAGTGG AAATTTACCA 

651 GTTGGTCATG TTCGAACTCG ATATGGCGGT TGCTTCGGTG CTNGTGTGGC 

701 TGGTGTNGGG GGTAACNGCG GCGGCAGGGT TGCTGTATGC GTGGTTCGGC 

751 AGGCGCGCGG TTTCGGATAA GGCNGTTTCC CCTGTGATGC CGTCGCCGCC 

801 GCAGTCGGTC GGGGAATATG TGCTNCTGGC GTTTGCGGCG GCGGTGTNGT 

851 CTGTGTGCTG CCTGTTTCNT TTGTTGGCAA TTGTTGTGAA AGCGTGGTCG 

901 GCCGGCGAAT CGTGGCGTGT GTTAATGGAA AGT GAAACGT GGCAGGCGGT 

951 GTGGAATACT NTGCGCTTCT CGGCGGCGGC GGTGTATGCG GCGGCGGTTT 

1001 TGGGTGTGGT GTATGCGGCG GCGGCGCGGC GGTCGGCGTG GATGCGCGGG 

1051 CTGATGTTTT TGCCGTTTAT GGTGTCGCCG GTTTGTGTTT CGGCGGGCGT 

1101 GCTGCTGCTT NATCCGCAGT GGACGGCTTC GTTGCCGCTG CTGCTGGCGA 

1151 TGTATGCGCT GCTGGCGTAT CCGTTTGTGG CAAAAGATGT TTTATCAGCC 

12 01 TGNGATGCAC TGCCGCCGGA TTACGGCAGG GCGGCGGCGG GTTTGGGTGC 

1251 AAACGGCTTT CAGACGGCAT GCCGCATCAC GTTCCCCCTC TTGAAACCGG 

1301 CGTTGCGGCG CGGTCTGACT TTGGCGGCGG CAACCTGCGT GGGCGAATTT 

1351 GCGGCAACCT TGTTCNTGTC GCGTCNCGAG TGGCAGACGC TGACGACTTT 

14 01 GATTTATGCC TATNTGGGAC GCGCGGGTGA NGATAATTAC GCGCGGGCGA 

1451 TGGTGCTGAC ATTGCTGTTG GCGGCGTTCG CGCTGGGTAT NTTCCTGCTG 

1501 TTGGACGGCG GCGAAGGCGG AAAACGGACG GAAACGTTAT AA 

This encodes a protein having amino acid sequence <SEQ ID 578>: 

1 MDGRRWAVWG AFALLPSAFL AAMWAPLWA VAAYDGLAWR AVLS DAYMLK 

51 RLAWTVFQAA ATCVLVLPLG VPVAWV LARL AFPGRALVLR LLMLPFVMPT 

101 LVAGVGVLAL FGA DGLXWRG WQDTPYLLLY GNVFFXL PVL VRAAYQGFVQ 

151 VPAARLQTAX TLGAGAWRRF WDIEMPVLRP WLAGG VCLVF LYCFSGFGLA 

201 LLLGGSRYAT VEVEIYQLVM FELDMAV ASV LVWLVXGVTA AAGLL YAWFG 

251 RRAVSDKAVS PVMPSPPQSV GEYVLLAF AA AVXSVCCLFX LLAIW KAWS 

301 AGE S WRVLME SETWQAWNT XRFS AAAVYA AAVLGVVYAA AA RRSAWMRG 

351 LMF LPFMVSP VCVSAGVLLL XPQWTAS LPL LLAMYALLAY PFVA KDVLSA 

4 01 XDALPPDYGR AAAGLGANGF QTACRITFPL LKPALRRGLT LAAATCVGEF 

451 AATLFXSRXE WQTLTTLIYA YXGRAGXDNY ARA MVLTLLL AAFALGXFLL 

501 LDGGEGGKRT ETL* 

ORF139a and ORF139-1 show 96.5% homology over a 514aa overlap: 

orf 139a . pep MDGRRWAVWGAFALLPSAFLAAMVVAPLWAVAAYDGLAWRAVLSDAYMLKRLAWTVFQAA 

I I I I I I : I I I I I I I I I I I I I I : I I I I I I I I ! I I I I I I I I I I I I I I I I I I I II M I I I I I I 
orf 139-1 MDGRRWVWGAFALLPSAFLAVMWAPLWAVAAYDGLAWRAVLSDAYMLKRLAWTVFQAA 
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orf 139a . pep ATCVLVLPLGVPVAWVLARLAFPGRALVLRLLMLPFVMPTLVAGVGVLALFGADGLXWRG 
I I I I I I I I I I I I I I I I I I I I I 1 I I I 1 1 I II I I 1 I I I I I I M I I I I I I I I I I I I I I I III 
orf 139-1 ATCVLVLPLGVPVAWVLARLAFPGRALVLRLLMLPFVMPTLVAGVGVLALFGADGLLWRG 

orf 139a . pep WQDTPYLLLYGNVFFXLPVLVRAAYQGFVQVPAARLQTAXTLGAGAWRRFWDIEMPVLRP 
I I I I I I I M II II I I I I I I I II I I 1 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 
orfl39-l RQDT PYLLLYGNVFFNLPVLVRAAYQGFVQVPAARLQT ART LGAGAWRRFWD IEMPVLRP 

orf 13 9a . pep WLAGGVCLVFLYCFSGFGLALLLGGSRYATVEVEIYQLVMFELDMAVASVLVWLVXGVTA 

I M M M M M II I M M II I M M I I I I I I N I I I I I I I I I I I I I I I I I I I I II I I I I 

orf 139-1 WLAGGVCLVFLYCFSGFGLALLLGGSRYATVEVEIYQLVMFELDMAVASVLVWLVLGVTA 

orf 13 9a . pep AAGLLYAWFGRRAVSDKAVSPVMPSPPQSVGEYVLLAFAAAVXSVCCLFXLLAIWKAWS 
I II II I I I I I I I II I I I I I I I I I I I I I I 1 I II I I I I I II I I I I If If I I i I I I I I I I I 
orf 139-1 AAGLLYAWFGRRAVSDKAVSPVMPSPPQSVGEYVLLAFAAAVLSVCCLFPLLAIVVKAWS 

orf 13 9a . pep AGESWRVLMESETWQAVWNTXRFSAAAVYAAAVLGWYAAAARRSAWMRGLMFLPFMVSP 
I I I I I I I I I I I II I I I I I II I I M I I 1 I I I I I I I I I I I I I I I I I I M II I I M II I I II 
orf 139-1 AGES WRVLME SET WQAVWNTLRFSAAAVYAAAVLGVVYAAAARRSAWMRGLMFLPFMVSP 

orf 139a . pep VCVS AGVLLLX PQWT AS L PLLLAMYALLAYP FVAKD VL SAX DAL PPD YGRAAAGLGANG F 
I I M II I I I I I I I I I I I I I I I I I I I M I I I I I II I I I I I I I I I I | | I | | i | || | | | || 
orf 13 9-1 VCVSAGVLLLYPQWTASLPLLLAMYALLAYPFVAKDVLSAWDALPPDYGRAAAGLGANGF 

orf 13 9a. pep QTACRITFPLLKPALRRGLTLAAATCVGEFAATLFXSRXEWQTLTTLIYAYXGRAGXDNY 
I M ! I M I I M I I I I I I I I I I II I I f I I I I I I I I I I I II II I II | | | | | Mil III 
orf 13 9-1 QTACRITFPLLKPALRRGLTLAAATCVGEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNY 

orf 13 9a . pep ARAMVLTLLLAAFALGXFLLLDGGEGGKRTETLX 
I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I 
orfl39-l ARAMVLTLLLAAFALGI FLLLDGGEGGKQTETLX 

Homology with a predicted ORF from N.gonorrhoeae 

ORF139 shows 95.2% identity over a 189aa overlap with a predicted ORF (ORF139ng) from 
N.gonorrhoeae: 

orfl39.pep AW SAGE S WRVLME SET WHAVWNT LR F S AAA 30 

I I I I M I I I I I I I I I I : I I I I I II I I I I I 
orfl39ng QSVGEYVLLAFSVAVLSVCCLFPLSAIWKAWSAGESRRVLMESETWQAVWNTLRFSAAA 327 

orf 139 . pep VYAAAVLGWYAAPARRSAWMRGLMFXPFMVSPVCVSAGVLLLYPQWTASLPLLLAMYAL 90 

I : I I I I I I I I I I I III : I I I I I : I I I I I I I I II I I I I I I I II I I I I I I I I I I I II I 
orfl3 9ng VFAAAVLGWYAAAARRLVWMRGLVFLPFMVSPVCVSAGVLLLYPGWTASLPLLLAMYAL 387 

orf 13 9 . pep LAYPFVAKDVLSAWDALPPDYGRAAAGLGANGFQTACRITFPLLKPALRRGLTLAAATCV 150 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I | | II II I II I I I I I 
orf 1 3 9ng LAYPFVAKDVLSAWDALPPDYGRAAAGLGANGFQTACRITFPLLKPALRRGLTLAAATCV 4 47 

orf 139. pep GE FAAT LFL SRPEWQTLTT L I YAYLGRAGE DNYARAMVL 189 

I M M if I I I M I II I I II I I I I M I I I i M I I | N N I 
orfl3 9ng GE FAAT L FL S RPEWQT LTTL I YAYLGRAGE DNYARAMVLT LLL S AFAVC IFLLLDNGEGG 507 

The complete length ORF139ng nucleotide sequence <SEQ ID 579> is predicted to encode a 
protein having amino acid sequence <SEQ ID 580>: 

1 MDGRCWAVRG AFSLLPSAFL AVMWAPLWA VAAYDGLAWR AVLSDAYMLK 

51 RLAWTVFQAA ATCVLVLPLG VPVAWVLARL AFPGRALVLR LLMLPFVMPT 

101 LVAGVGVLAL FGADGLLWRG RQDTPYLLLY GNVFFNLPVL VRAAYQG FAQ 

151 VPAARLQTAR TLGAGAWRPF WDIEMP VLRP WLAGGVCLVF LYCFSGFGLA 

201 LLLGGSRYAT VEVEIYQLVM FELDMAGASA LVWLVLGVTA AAGLLYAWFG 

251 RRAVSDKAVS PVMPSPPQSV GEYVLLAFSV AVLSVCCLFP LSAIWKAWS 

301 AGESRRVLME SETWQAVWNT LRFSAAAVFA AAVLGWYAA AARRLVWMRG 

351 LVFLPFMVSP VCVS AG VLLL YPGWTASLPL LLAMYALLAY PFVAKDVLSA 

4 01 WDALPPDYGR AAAGLGANGF QTACRITFPL LKPALRRGLT LAAATCVGEF 

451 AATLFLSRPE WQTLTTLIYA YLGRAGEDNY ARAMVLTLLL SAFAVCIFLL 

501 LDNGE GGKRT ETL* 
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Further work revealed a variant gonococcal DNA sequence <SEQ ID 581>: 

1 ATGGATGGAC GGTGTTGGGC GGTACGGGGT GCTTTTTCCC TGCTGCCTTC 

51 GGCTTTTTTG GCGGTAATGG TCGTTGCGCC TTTGTGGGCG GTGGCGGCGT 

101 ATGACGGTTT GGCGTGGCGC GCGGTGCTGT CGGATGCCTA TATGCTCAAA 

151 CGTTTGGCGT GGACGGTGTT TCAGGCGGCG GCAACCTGTG TGCTGGTGCT 

201 GCCTTTGGGC GTGCCTGTCG CGTGGGTGCT GGCGCGGCTG GCGTTCCCGG 

251 GGCGGGCTTT GGTGCTGCGC CTGCTGATGC TGCCGTTTGT GATGCCCACG 

301 CTGGTGGCGG GCGTGGGCGT GCTGGCTCTG TTCGGGGCGG ACGGGCTGTT 

351 GTGGCGCGGC CGGCAGGATA CGCCGTATCT GTTGTTGTAC GGCAATGTGT 

4 01 TTTTCAACCT GCCCGTGTTG GTCAGGGCGG CGTATCAGGG GTTTGCTCAA 

4 51 GTGCCTGCGG CACGGCTTCA GACGGCACGG ACGTTGGGCG CGGGGGCGTG 

501 GCGGCGGTTT TGGGACATTG AAATGCCCGT TTTGCGCCCG TGGCTTGCCG 

551 GCGGCGTGTG CCTTGTCTTC CTGTATTGTT TTTCGGGGTT CGGGCTGGCA 

601 TTGCTGTTGG GCGGCAGCCG TTATGCCACG GTCGAAGTGG AAATTTACCA 

651 GTTGGTTATG TTCGAACTCG ATATGGCGGG GGCTTCGGCG CTGGTGTGGC 

701 TGGTGTTGGG GGTAACGGCG GCGGCAGGGT TGCTGTATGC GTGGTTCGGC 

751 AGGCGCGCGG TTTCGGATAA GGCGGTTTCC CCCGTGATGC CGTCGCCGCC 

801 GCAATCGGTG GGGGAATATG TATTGCTGGC ATTTTCGGTG GCGGTGTTGT 

851 CCGTGTGCTG CCTGTTTCCT TTGTCGGCAA TTGTTGTGAA AGCGTGGTCG 

901 GCCGGCGAAT CGCGGCGTGT GTTAATGGAA AGT GAAACGT GGCAGGCAGT 

951 GTGGAATACt ttGCGCTTTT CGGCGGCGGC GGTGTTTGCG GCGGCGGTTT 

1001 TGGGTGTGGT GTATGCGGCG GCGGCGCGGC GGCTGGTGTG GATGCGCGGA 

1051 CTGGTGTTTT TACCGTTTAT GGTGTCGCCG GTTTGTGTTT CGGCGGGCGT 

1101 GCTGCTGCTT TATCCGGGGT GGACGGCTTC GTTACCGCTG CTGCTGGCGA 

1151 TGTATGCGCT GCTGGCGTAT CCGTTTGTGG CAAAAGATGT TTTATCGGCC 

1201 TGGGATGCAC TGCCGCCGGA TTACGGCAGG GCGGCGGCAG GTTTGGGCGC 

1251 AAACGGCTTT CAGACGGCAT GCCGTATCAC GTTCCCCCTC TTGAAACCGG 

1301 CGTTGCGGCG CGGTCTGACT TTGGCGGCGG CGACGTGTGT GGGCGAATTT 

1351 GCGGCAACCT TGTTCCTGTC GCGT CCGGAA TGGCAGACGT TGACGACTTT 

14 01 GATTTATGCC TATTTGGGGC GTGCGGGTGA GGACAATTAT GCGCGGGCAA 

1451 TGGTGTTGAC ATTGCTGTTG TCGGCATTTG CGGTGTGCAT TTTCCTGCTG 

1501 TTGGACAACG GCGAAGGCGg aaaACGGACG GAAACGTTAT AA 

This corresponds to the amino acid sequence <SEQ ID 582; ORF139ng-l>: 

1 MDGRCWAVRG AFSLLPSAFL AVMWAPLWA VAAYDGLAWR AVLS DAYMLK 

51 RLAWTVFQAA ATCVLVLPLG VPVAWVL ARL AFPGRALVLR LLMLPFVMPT 

101 LVAGVGVLAL FGA DGLLWRG RQDTPYLLLY GNVFFNLPVL VRAAYQGFAQ 

151 VPAARLQTAR TLGAGAWR.RF WDIEMPVLRP WLAGG VCLVF LYCFSGFGLA 

201 LLLGGSRYAT VEVEIYQLVM FELDMAG ASA LVWLVLGVTA AAGLL YAWFG 

251 RRAVSDKAVS PVMPSPPQSV GEYVLLAFS V AVLSVCCLFP LSAIW KAWS 

301 AGESRRVLME SETWQAVWNT LRFS AAAVFA AAVLGWYAA AA RRLVWMRG 

351 LVF LPFMVSP VCVSAGVLLL YPGWTASL PL LLAMYALLAY PFVA KDVLSA 

4 01 WDALPPDYGR AAAGLGANGF QTACRITFPL LKPALRRGLT LAAATCVGEF 

451 AATLFLSRPE WQTLTTLIYA YLGRAGE DNY ARAM VLTLLL SAFAVCIFLL 

501 LDNGEGGKRT ETL* 

ORF139ng-l and ORF139-1 show 95.9% identity over 513aa overlap: 

or f 1 3 9ng MDGRCWAVRGAFSLLPSAFLAVMVVAPLWAVAAYDGLAWRAVLSDAYMLKRLAWTVFQAA 
MM I : I M I : I i M M M M M M M M M M M M M I M M M M M M M 1 M I 
orf 139-1 MDGRRWVVWGAFALLPSAFLAVMWAPLWAVAAYDGLAWRAVLSDAYMLKRLAWTVFQAA 

orfl39ng ATCVLVLPLGVPVAWVLARLAFPGRALVLRLLMLPFVMPTLVAGVGVLALFGADGLLWRG 

M M M M M M M M M M M M M M M M M I I I M M I M M M M M M M M M 
or f 1 3 9-1 ATCVLVLPLGVPVAWVLARLAFPGRALVLRLLMLPFVMPTLVAGVGVLALFGADGLLWRG 

orfl39ng RQDTPYLLLYGNVFFNLPVLVRAAYQGFAQVPAARLQTARTLGAGAWRRFWDIEMPVLRP 
I M M M M M M M I M M M I M M I : M M M M I M M M M M I M M M M M I 
orf 139-1 RQDTPYLLLYGNVFFNLPVLVRAAYQGFVQVPAARLQTARTLGAGAWRRFWDIEMPVLRP 

orfl39ng WLAGGVCLVFLYCFSGFGLALLLGGSRYATVEVEIYQLVMFELDMAGASALVWLVLGVTA 
M M M M 1 M M M M M M M M M M M M M M M M M M I I I : I I I I 1 ! ! I ! I 
orf 139-1 WLAGGVCLVFLYCFSGFGLALLLGGSRYATVEVEIYQLVMFELDMAVASVLVWLVLGVTA 

orfl39ng AAGLLYAWFGRRAVSDKAVSPVMPSPPQSVGEYVLLAFSVAVLSVCCLFPLSAIVVKAW3 
I M M M M M M M M M M M M M M M M M M I : : M M M M M I M M I I I I 
orf 139-1 AAGLLYAWFGRRAVSDKAVSPVMPSPPQSVGEYVLLAFAAAVLSVCCLFPLLAIVVKAWS 



CHIR-0160 (356.001) 



-351- 



PATENT 



orfl39ng AGESRRVLMESETWQAWNTLRFSAAAVFAAAVLGWYAAAARRLVWMRGLVFLPFIWSP 
I I I I I I I M I I I I I I I j I [ I I I I I I | | : | | | | | | | | | | | | | | | : | | | ! | : | | | | | M I 
orf 13 9 AGESWRVLME3ETWQAWNTLRFSAAAVYAAAVLGVVYAAAARRSAWMRGLMFLPFMVSP 

orfl3 9ng VCVSAGVLLLYPGWTASLPLLLAMYALLAYPFVAKDVLSAWDALPPDYGRAAAGLGANGF 
I I I I I I I I I I I I I I I II I I I I I I I I | | | | | | | | | | | | | | | | | | ! | | | | | | | | | | H|| j 
orf 13 9-1 VCVSAGVLLLYPQWTASLPLLLAMYALLAYPFVAKDVLSAWDALPPDYGRAAAGLGANGF 

orfl39ng QTACRITFPLLKPALRRGLTLAAATCVGEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNY 

, I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I | | | | | | || | M 1 | | | | | | 

orf 13 9-1 QTACRITFPLLKPALRRGLTLAAATCVGEFAATLFLSRPEWQTLTTLIYAYLGRAGEDNY 

orfl39ng ARAMVLT LLLSAFAVC I FLLLDNGEGGKRTET L 

I M I I I I I I I : I II : I I I I I I : I I I II : I I I I 
orf 13 9-1 ARAMVLTLLLAAFALGIFLLLDGGEGGKQTETL 

Based on the presence of a predicted binding-protein-dependent transport systems inner membrane 
component signature (underlined) in the gonococcal protein, it is predicted that the proteins from 
N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 

Example 70 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 583>: 

1 ATGGACGGCT GGACACAGAC GCTGTCCGCG CAAACCCTGT TGGGCATTTC 
51 GGCGGCGGCA ATCATCCTCA TTCTGATTTT AATCGTCAGA TTCCGCATCC 
ACGCGCTGCT GACACTGGTC ATCGTCAGCC TGCTGACGGC TTTGGCAACC 
GGTTTGCCCA CAGGCAGCAT TGTCAAAGAC ATACTGGTCA AAAACTTCGG 
CGGCACGCTC GGCGGCGTGG CGCTTCTGGT CGGCCTGGGC GCGATGCTCG 
251 AACGTTTGGT C. . . 

This corresponds to the amino acid sequence <SEQ ID 584; ORF140>: 

1 MDGWTQTLSA QTLLGISAAA IILILILIVR FRIHALLTLV IVSLLTALAT 
51 GLPTGSIVKD ILVKNFGGTL GGVALLVGLG AMLERLV. . 

Further work revealed the complete nucleotide sequence <SEQ ID 585>: 

ATGGACGGCT GGACACAGAC GCTGTCCGCG CAAACCCTGT TGGGCATTTC 
GGCGGCGGCA ATCATCCTCA TTCTGATTTT AATCGTCAAA TTCCGCATCC 
ACGCGCTGCT GACACTGGTC ATCGTCAGCC TGCTGACGGC TTTGGCAACC 
GGTTTGCCCA CAGGCAGCAT TGTCAACGAC ATACTGGTCA AAAACTTCGG 
CGGCACGCTC GGCGGCGTGG CGCTTCTGGT CGGCCTGGGC GCGATGCTCG 
GACGTTTGGT CGAAACATCC GGCGGCGCAC AGTCGCTGGC GGACGCGCTG 
ATCCGGATGT TCGGCGAAAA ACGCGCACCG TTCGCGCTGG GCGTTGCCTC 
GCTGATTTTC GGCTTCCCGA TTTTCTTCGA TGCCGGACTA ATCGTCATGC 
TGCCCATCGT GTTCGCCACC GCACGGCGCA TGAAACAGGA CGTACTGCCC 
TTCGCGCTTG CCTCCATCGG CGCATTTTCC GTCATGCACG TCTTCCTGCC 
GCCCCATCCG GGCCCGATTG CCGCTTCCGA ATTTTACGGC GCGAACATCG 
551 GCCAAGTTTT GATTTTGGGT CTGCCGACCG CCTTCATCAC ATGGTATTTC 
601 AGCGGCTATA TGCTCGGCAA AGTGTTGGGG CGCACCATCC ATGTTCCCGT 
^ ^^ GAACTG CTCAGCG GCG GCACGCAAGA CAACGACCTG CCGAAAGAAC 
7fl1 CTGCCAAAGC AGGAACGGTC GTCGCCATCA TGCTGATTCC CATGCTGCTG 
ATTTTCCTGA ATACCGGCGT ATCGGCCCTC ATCAGCGAAA AACT CGTAAG 
TGCGGACGAA ACCTGGGTTC AGACGGCAAA AATAATCGGT TCGACACCGA 
TCGCCCTTCT GATTTCCGTA TTGGTCGCAC TGTTTGTCTT GGGACGCAAA 
CGCGGCGAAA GCGGCAGCGC GTTGGAAAAA ACCGTGGACG GCGCACTCGC 
CCCCGTCTGT TCCGTGATTC TGATTACCGG CGCGGGCGGT ATGTTCGGCG 
GCGTTTTGCG CGCTTCCGGC AT CGGC AAGG CACTCGCCGA CAGCATGGCG 
GATTTGGGCA TTCCCGTCCT TTTGGGCTGT TTCCTTGTCG CCTTGGCACT 
™^n GCG CAAGGTTCGG CAACCGTCGC CCTGACCACC GCCGCCGCGC 
TGATGGCTCC TGCCGTTGCC GCCGCCGGCT TTACCGACTG GCAGCTCGCC 
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1201 TGTATCGTAT TGGCAACGGC GGCAGGTTCG GTCGGTTGCA GCCACTTCAA 

1251 CGACTCCGGC TTCTGGCTGG TCGGCCGTCT CTTGGACATG GACGTACCGA 

1301 CCACGCTGAA AACCTGGACG GTCAACCAAA CCCTCATCGC ACTCATCGGC 

1351 TTTGCCTTGT CCGCACTGCT GTTCGCCATC GTCTGA 

This corresponds to the amino acid sequence <SEQ ID 586; ORF140-1>: 

1 MDGWTQTLSA QTLLGISAAA IILILILIVK FRIHALLTLV IVSLLTALAT 

51 GLPTGSIVND ILVKNFGGTL GGVALLVGLG AMLGRLV ETS GGAQSLADAL 

101 Trmfgekrap falgvas lif gfpiffdagl ivml pivfat arrmkqdvlp 

151 FALASIGAFS VMHV FLPPHP GPIAASEFYG ANIGQVLILG LPTAFITWYF 

201 sgymlgkvlg rtihvpvpel lsggtqdndl pkepak agtv vaimlipmll 

251 ifln tgvsal iseklvsade twvqtakiig s tpiallisv lvalfvlg rk 

301 rgesgsalek tvdgalapvc svilitgagg mfggvl rasg igkaladsma 

351 dlg ipvllgc flvalalria qgsat valtt aaalmapava aa gftdwqla 

4 01 CIVLATAAGS vgcshfndsg fwlvgrlldm dvpttlktwt vnqt lialig 

451 FALSALLFAI V * 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF140 shows 95.4% identity over a 87aa overlap with an ORF (ORF140a) from strain A of N. 
meningitidis: 



10 20 30 40 50 60 

orfl40.pep MDGWTQTLSAQTLLGISAAAIILILILIVRFRIHALLTLVIVSLLTALATG LPTGSIVKD 
1 I i I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I IN I i I I I I I I M I I I : I 
orfl40a MDGWTQTLSAQTLLGISAAAIILILILIVKFRIHALLTLVIVSLLTALATG LPTGSIVND 

10 20 30 40 50 60 



orfl4 0.pep ILVKNFGGTL GGVALLVGLGAMLERLV 
: I I I I I I I I I I I I II I I I I I I I I Ml 
orf 140a VLVKNFGGTL GGVALLVGLGAMLGRLV ETSGGAQSLADALIRMFGEKRAP FALGVASLIF 

70 80 90 100 110 120 

The complete length ORF 140a nucleotide sequence <SEQ LD 587> is: 



1 ATGGACGGCT GGACACAGAC GCTGTCCGCG CAAACCCTGT TGGGCATTTC 

51 GGCGGCGGCA ATCATCCTCA TTCTGATTTT AATCGTCAAA TTCCGCATCC 

101 ACGCGCTGCT GACACTGGTC ATCGTCAGCC TGCTGACGGC TTTGGCAACC 

151 GGTTTGCCCA CAGGCAGCAT TGTCAACGAC GTACTGGTCA AAAACTTCGG 

201 CGGCACGCTC GGCGGCGTGG CGCTTCTGGT CGGCCTGGGC GCGATGCTCG 

251 GACGTTTGGT CGAAACATCC GGCGGCGCAC AGTCGCTGGC GGACGCGCTG 

301 ATCCGGATGT TCGGCGAAAA ACGCGCACCG TTCGCGCTGG GCGTTGCCTC 

351 GCTGATTTTC GGCTTCCCGA TTTTCTTCGA TGCCGGACTA ATCGTCATGC 

4 01 TGCCCATCGT GTTCGCCACC GCACGGCGCA TGAAACAGGA CGTACTGCCC 

4 51 TTCGCGCTTG CCTCCATCGG CGCATTTTCC GTCATGCACG TCTTCCTGCC 
501 GCCCCATCCG GGCCCGATTG CCGCTTCCGA ATTTTACGGC GCGAACATCG 

5 51 GCCAAGTTTT GATTTTGGGT CTGCCGACCG CCTTCATCAC ATGGTATTTC 
601 AGCGGCTATA TGCTCGGCAA AGTGTTGGGG CGCACCATCC ATGTTCCCGT 
651 TCCCGAACTG CTCAGCGGCG GCACGCAAGA CAACGACCTG CCGAAAGAAC 
7 01 CTGCCAAAGC AGGAACGGTC GTCGCCATCA TGCTGATTCC CATGCTGCTG 

7 51 ATTTTCCTGA ATACCGGCGT ATCGGCCCTC ATCAGCGAAA AACT CGTAAG 

8 01 TGCGGACGAA ACCTGGGTTC AGACGGCAAA AATAATCGGT TCGACACCGA 
851 TCGCCCTTCT GATTTCCGTA TTGGTCGCAC TGTTTGTCTT GGGACGCAAA 
901 CGCGGCGAAA GCGGCAGCGC GTTGGAAAAA ACCGTGGACG GCGCACTCGC 
951 CCCCGTCTGT TCCGTGATTC TGATTACCGG CGCGGGCGGT ATGTTCGGCG 

1001 GCGTTTTGCG CGCTTCCGGC ATCGGCAAGG CACTCGCCGA CAGCATGGCG 

1051 GATTTGGGCA TTCCCGTCCT TTTGGGCTGT TTCCTTGTCG CCTTGGCACT 

1101 GCGTATCGCG CAAGGTTCGG CAACCGTCGC CCTGACCACC GCCGCCGCGC 

1151 TGATGGCTCC TGCCGTTGCC GCCGCCGGCT TTACCGACTG GCAGCTCGCC 

1201 TGTATCGTAT TGGCAACGGC GGCAGGTTCG GTCGGTTGCA GCCACTTCAA 

1251 CGACTCCGGC TTCTGGCTGG TCGGCCGCCT CTTGGACATG GACGTACCGA 

1301 CCACGCTGAA AACCTGGACG GTCAACCAAA CCCTCATCGC ACTCATCGGC 

1351 TTTGCCTTGT CCGCACTGCT GTTCGCCATC GTCTGA 
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This encodes a protein having amino acid sequence <SEQ ID 588>: 

1 MDGWTQTLSA QTLLGISAAA IILILILIVK FRIHALLTLV IVSLLTALAT 

51 GLPTGSIVND VLVKNFGGTL GGVALLVGLG AMLGRLV ET S GGAQSLADAL 

101 IRMFGEKRAP FALGVAS LIF GFPIFFDAGL IVML PIVFAT ARRMKQDVLP 

151 FALASIGAFS VMHV FLPPHP GPIAASEFYG ANIGQVLILG LPTAFITWYF 

201 SGYMLGKVLG RTIHVPVPEL LSGGTQDNDL PKEPA KAGTV VAIMLIPMLL 

251 IFLNTGVSAL ISEKLVSADE TWVQTAKIIG S TPIALLISV LVALFVLG RK 

301 RGESGSALEK TVDGALAPVC SVILITGAGG MFGGVL RASG IGKALADSMA 

351 DLG IPVLLGC FLVALALRIA QGSAT VALTT AAALMAPAVA AA GFTDWQLA 

401 CIVLATAAGS VGCSHFNDSG FWLVGRLLDM DVPTTLKTWT VNQT LIALIG 

451 FALSALLFAI V * 

ORF140a and ORF140-1 show 99.8% identity over a 461aa overlap: 

orfl40-l pep MDGWTQTLSAQTLLGISAAAIILILILIVKFRIHALLTLVIVSLLTALATGLPTGSIVND 60 

I I I I I I I I I I I I I [ I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I HIM 

orf 14 0a MDGWTQTLSAQTLLGISAAAIILILILIVKFRIHALLTLVIVSLLTALATGLPTGSIVND 60 

orfl40-l .pep ILVKNFGGTLGGVALLVGLGAMLGRLVETSGGAQSLADALIRMFGEKRAPFALGVASLIF 120 
: I I I I I I I M II I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I 1 I I I I I I M I I I I II I I 

orf 14 0a VLVKNFGGTLGGVALLVGLGAMLGRLVETSGGAQSLADALIRMFGEKRAP FALGVASLIF 120 

orf 140-1 .pep GFPIFFDAGLIVMLPIVFATARRMKQDVLPFALASIGAFSVMHVFLPPHPGPIAASEFYG 180 

I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I II I I I I II I I I II I I I I I 
orf 14 0a GFPIFFDAGLIVMLPIVFATARRMKQDVLPFALASIGAFSVMHVFLPPHPGPIAASEFYG 810 

orf 140-1. pep ANIGQVLILGLPTAFITWYFSGYMLGKVLGRTIHVPVPELLSGGTQDNDLPKEPAKAGTV 240 

orf 14 0a ANIGQVLILGLPTAFITWYFSGYMLGKVLGRTIHVPVPELLSGGTQDNDLPKEPAKAGTV 240 

orf 140-1 .pep VAIMLIPMLLIFLNTGVSALISEKLVSADETWVQTAKIIGSTPIALLISVLVALFVLGRK 300 

M I I I I I I I I I I I I I I II I I I i I I I I I I I I I I I I I I I I 1 II I I I I II II I I I I I I I I I I I 
orf 14 0a VAIMLI PMLLIFLNTGVSALI SEKLVS ADETWVQTAKI IGSTPIALLI SVLVALFVLGRK 300 

orf 140-1 .pep RGESGSALEKTVDGALAPVCSVILITGAGGMFGGVLRASGIGKALADSMADLGIPVLLGC 360 

I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I I I 
orf 140a RGESGSALEKTVDGALAPVCSVILITGAGGMFGGVLRASGIGKALADSMADLGIPVLLGC 3 60 

orf 140-1 .pep FLVALALRIAQGSATVALTTAAALMAPAVAAAGFT DWQLACIVLATAAGS VGCSHFNDSG 420 

I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 
orf 1 4 0a FLVALALRIAQGSATVALTTAAALMAPAVAAAGFTDWQLACIVLATAAGSVGCSHFNDSG 4 20 

orf 140-1. pep FWLVGRLLDMDVPTTLKTWTVNQTLIALIGFALSALLFAIV 461 

I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 
orfl40a FWLVGRLLDMDVPTTLKTWTVNQTLIALIGFALSALLFAIV 461 

Homology with a predicted ORF from N. gonorrhoeae 

ORF140 shows 92% identity over a 87aa overlap with a predicted ORF (ORF140ng) from 
N. gonorrhoeae: 

orf 140 .pep MDGWTQTLSAQTLLGISAAAIILILILIVRFRIHALLTLVIVSLLTALATGLPTGSIVKD 60 

III I II I I I I I I I I I I I I I I I I I I I I I I : I I I : I I I I I I I : I I I I I I I I I I I I I II I : I 
orfl40ng MDGRTQTLSAQTLLGISAAAIILILILIVKFRIRALLTLVIASLLTALATGLPTGSIVND 60 

orf 140. pep ILVKNFGGTLGGVALLVGLGAMLERLV 87 

: I I I I I I I I I I I I II I I I I I I I I III 
orfl40ng VLVKNFGGTLGGVALLVGLGAMLGRLVETSGGAQSLADALIRMFGEKRAPFAPGVASLIF 120 

The complete length ORF140ng nucleotide sequence <SEQ ID 589> was predicted to encode a 
protein having amino acid sequence <SEQ ID 590>: 

1 MDGRTOTLSA OTLLGISAAA IILILILIVK FRIRALLTLV IASLLTALAT 
51 SLPTGSIVND VLVKNFGGTL GGVALLVGLG AMLGRLV ETS GGAQSLADAL 
101 IRMFGEKRAP FAPGVAS LIF GFPIFFDAGL IVML PIVFAT ARRMKQDVLP 
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151 FALASVGAFS VMHV FLPPHP GPIAASEFYG ANIGQVLILG LPTAFITWYF 

201 SGYMLGKVLG RAIHVPVPEL LSGGTQDSDP PKEPAK AGTV VAVMLIPMLL 

251 IFLNTGVSAL ISEKLVSADE TWVQTAKMIG S TPVALLISV LAALLVLG RK 

301 RGESGSTLEK TVDGALAPA C SVILITGAGG MFGGVL RASG IGKALADSMA 

351 DLG IPVLLGC FLVALALRIA QGSAT VALTT AAALMAPAVA AA GFTDWQLA 

4 01 CIVLATAAGS VGCSHFNDSG FWLVGRL SDM DVPTTLKTWT VNQT LIAFIG 

451 FALSALLFAI V * 

Further work revealed a variant gonococcal DNA sequence <SEQ ID 591>: 



1 ATGGACGGCC GGACACAGAC GCTGTCCGCG CAAACCTTGT TGGGCATTTC 

51 GGCGGCGGCA ATCATCCTCA TTCTGATTTT AATCGTCAAA TTCCGCATCC 

101 GCGCGCTGCT GACACTGGTC ATCGCCAGCC TGCTGACGGC TTTGGCAACC 

151 GGTTTGCCCA CAGGCAGCAT CGTCAACGAC GTACTGGTCA AAAACTTCGG 

201 CGGCACGCTC GGCGGCGTGG CGCTTCTGGT CGGTCTGGGC GCAATGCTCG 

251 GACGTTTGGT AG AAAC AT C C GGCGGCGCAC AGTCGCTGGC GGACGCGCTG 

301 ATCCGGATGT TCGGCGAAAA ACGCGCACCG TTCGCTCCGG GCGTTGCCTC 

351 GCTGATTTTC GGCTTCCCGA TTTTCTTCGA TGCCGGACTA ATCGTCATGC 

4 01 TGCCCATCGT ATTCGCCACC GCACGGCGCA TGAAACAGGA CGTACTGCCC 

4 51 TTCGCGCTTG CCTCCGTCGG CGCATTTTCC GTCATGCACG TCTTCCTGCC 

501 GCCCCATCCG GGCCCGATTG CCGCTTCCGA ATTTTACGGC GCGAACATCG 

551 GCCAGGTTTT GATTTTGGGT CTGCCGACCG CCTTCATCAC ATGGTATTTC 

601 AGCGGCTATA TGCTCGGCAA AGTGTTGGGG CGCGCCATCC ATGTTCCCGT 

651 TCCCGAACTG CTCAGCGGCG GCACG CAAGA CAGCGACCCG CCGAAAGAAC 

7 01 CTGCCAAAGC AGGAACGGTC GTCGCCGTCA TGCTGATTCC CATGCTGCTG 

7 51 ATTTTCCTGA ATACCGGCGT ATCAGCCCTC ATCAGCGAAA AACTCGTAAG 

801 TGCGGACGAA ACTTGGGTTC AGACGGCAAA AATGATCGGT TCGACACCTG 

851 TCGCCCTTCT GATTTCCGTA TTGGCCGCAC TGTTGGTCTT GGGACGCAAA 

901 CGCGGCGAAA GCGGCAGCAC GTTGGAAAAA ACCGTGGACG GCGCACTCGC 

951 CCCCGCCTGT TCCGTGATTC TGATTACCGG CGCGGGCGGT ATGTTCGGCG 

1001 GCGTTTTGCG CGCTTCCGGC ATCGGCAAGG CACTCGCCGA CAGCATGGCG 

1051 GATTTGGGCA TTCCCGTCCT TTTGGGCTGC TTCCTTGTCG CCTTGGCACT 

1101 GCGTATCGCG CAAGGTTCGG CAACCGTCGC CCTGACCACA GCCGCCGCGC 

1151 TGATGGCTCC TGCCGTTGCC GCCGCCGGCT TTACCGACTG GCAGCTCGCC 

1201 TGTATCGTAT TGGCAACGGC GGCAGGTTCG GTCGGTTGCA GCCACTTCAA 

1251 CGACTCCGGC TTCTGGCTGG TCGGCCGCCT CTTGGATATG GACGTACCGA 

1301 CCACGCTGAA AACCTGGACG GTCAACCAAA CCCTCATCGC ATTCATCGGC 

1351 TTTGCCTTGT CCGCACTGCT GTTTGCCATC GTCTGA 

This corresponds to the amino acid sequence <SEQ ID 592; ORF140ng-l>: 



1 MDGRTQTLSA QTLLGI S AAA IILILILIVK FRIRALLTLV IASLLTALAT 

51 GLPTGSIVND VLVKNFGGTL GGVALLVGLG AMLGRLV ETS GGAQSLADAL 

101 IRMFGEKRAP FAPGVAS LIF GFPIFFDAGL IVML PIVFAT ARRMKQD VLP 

151 FALASVGAFS VMHV FLPPHP GPIAASEFYG ANIGQVLILG LPTAFITWYF 

201 SGYMLGKVLG RAIHVPVPEL LSGGTQDSDP PKEPA KAGTV VAVMLIPMLL 

2 51 IFL NTGVSAL ISEKLVSADE TWVQTAKMIG S TPVALLISV LAALLVLG RK 

301 RGESGSTLEK TVDGALAPAC SVILITGAGG MFGGVL RASG IGKALADSMA 

351 DLG IPVLLGC FLVALALRIA QGSAT VALTT AAALMAPAVA AA GFTDWQLA 

401 CIVLATAAGS VGCSHFNDSG FWLVGRLLDM DVPTTLKTWT VNQT LIAFIG 

451 FALSALLFAI V * 

ORF140ng-l and ORF140-1 show 96.3% identity over 461aa overlap: 



orf 140ng-l.pep MDGRTQTLSAQTLLGISAAAIILILILIVKFRIRALLTLVIASLLTALATGLPTGSIVND 

III 1 I 1 I I I I I I I I I I I I I I I I I : M I I I I I : I I I I I I I I II II M I I I I 

or f 1 4 0 -1 MDGWTQTLSAQTLLGI SAAAII LI LI LIVKFRIHALLTLVIVS LLTALATGLPTGS IVND 



orf 140ng-l . pep VLVKNFGGTLGGVALLVGLGAMLGRLVETSGGAQSLADALIRMFGEKRAPFAPGVASLIF 
orf 140-1 ILVKNFGGTLGGVALLVGLGAMLGRLVETSGGAQSLADALIRMFGEKRAPFALGVASLIF 



orfl4 0ng-l.pep GFPIFFDAGLIVMLPIVFATARRMKQDVLPFALASVGAFSVMHVFLPPHPGPIAASEFYG 
I I I I I I i II I I I I I! I I I I I I I I I I I I I I I I I I ! I : I I I II I I I I I I I I I I I I I I I I I I I 
orf 14 0-1 GFPIFFDAGLIVMLPIVFATARRMKQDVLPFALASIGAFSVMHVFLPPHPGPIAASEFYG 

orf 140ng-l . pep ANIGQVLILGLPTAFITWYFSGYMLGKVLGRAIHVPVPELLSGGTQDSDPPKEPAKAGTV 
I I I I I II I I I I I I I I I I I I I I I I I I I I 1 I I I : I II I I I I I I I I II I I : I II I I I I I I I I 
orf 140-1 ANIGQVLILGLPTAFITWYFSGYMLGKVLGRTIHVPVPELLSGGTQDNDLPKEPAKAGTV 
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orfl4 0ncr-l pep VAVMLIPMLLIFLNTGVSALISEKLVSADETWVQTAKMIGSTPVALLISVLAALLVLGRK 

||:l||||||||||||lilMlil[IIIIIMIIMI: :| II 1111:1 I: 

orf 14 0-1 VAIMLIPMLLIFLNTGVSALISEKLVSADETWVQTAKIIGSTPIALLISVLVALFVLGRK 

orfl4 0nq-l pep RGESGSTLEKTVDGALAPACSVILITGAGGMFGGVLRASGIGKALADSMADLGIPVLLGC 
| | | | | !: 1 I M I I M I I I : I I I I I I I I I I I I I I I I I I I I II I I I I M I I M I I I I I I I I I 
O r f 1 4 0 - 1 RGESGS ALEKTVDGALAPVCSVILITGAGGMFGGVLRASG I GKALADSMADLGI PVLLGC 

orfl40ng-l pep FLVALALRIAQGSATVALTTAAALMAPAVAAAGFTDWQLACIVLATAAGSVGCSHFNDSG 
] | | | | | | I I I I I II II I I I I I I I I II ! I I I I I I M II I I I I I I I I I I I I I I I I M I I I M 
orf 140-1 FLVALALRIAQGSATVALTTAAALMAPAVAAAGFTDWQLACIVLATAAGSVGCSHFNDSG 

orf 140ng-l . pep FWLVGRLLDMDVPTTLKTWTVNQTLIAFIGFALSALLFAIV 

I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I 

orf 14 0-1 FWLVGRLLDMDVPTTLKTWTVNQTLIALIGFALSALLFAIV 

Furthermore, ORF140ng-l is homologous to an E.coli protein: 

gi | 882633 (029579) ORF_o454 [Escherichia coli] >gi 11789097 (AE000358) o454; 
This 454 aa ORF is 34% identical (9 gaps) to 444 residues of an approx. 456 aa 
protein GNTP_BACLI SW : P4 6832 [Escherichia coli] Length = 454 
Score = 210 bits (529), Expect = le-53 

Identities = 130/384 (33%), Positives = 194/384 (49%), Gaps = 19/384 (4%) 

ETSGGAQSLADALIRMFGEKRAPFAPGVASLIFGFPIFFDAGLIVMLPIVFATARRMKQD 147 
25 E SGGA+SLA+ R G+KR A +A+ G P+FFD G I++ PI++ A+ K 

EHSGGAESLANYFSRKLGDKRTIAALTLAAFFLGIPVFFDVGFIILAPIIYGFAKVAKIS 139 

VLPFALASVGAFSVMHVFLPPHPGPIAASEFYGANIGQVLILGLPTAFITWYFSGYMLGK 207 
L F L G +HV +PPHPGP+AA+ A+IG + I+G+ +1 GY K 



Query: 


88 


Sbjct: 


80 


Query: 


148 


Sbjct: 


140 


Query: 


208 


Sbjct: 


199 




258 


Sbjct : 


256 




318 


Sbjct: 


313 


Query: 


378 


Sbjct: 


371 




438 


Sbjct: 


431 



-SGGTQDSDPPKEPAKAGTVVAVMLIPMLLIFLNTGV 257 

G T+ SD P A V ++++IP+ +1 T 
EEGATKLSDKINPPGVA— LVTSLIVIPIAI IMAGT — 255 



VIL+TGAGG+FG VL SG+GKALA+ + + +P+L F+++LALR +QGS 



G +G SH NDSGFW+V + L - 



TWTV T++ F GF ++ ++A++ 
TWTVLTTILGFTGFLITWCVWAVI 454 

Based on this analysis, including the identification of the presence of a putative leader sequence 
(double-underlined) and several putative transmembrane domains (single-underlined) in the 
gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 71 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 593>: 
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GCGTATTTTT TGCCGTTATC GGACTGACTT CCTGCGGCTT TGCCGGTTTC 
AACTTTTTGG GCAGACACCA CGGGCGCAC . GTCGTCCTGA TTCTCATCGG 
CTGTATCGGG CTGATTCCAG TTGCCCATTT CCTCAACCCC GCTGCCGCCG 
CCTTTGCCGC CGCCGGACTG GTGCTGCACG GTTATTCTTT GGCTCGCCGG 
CGCGTGATTG CCGCCTCTTT TCTGCTCGGT ACGGGCTGGA CGCTGATGTC 
GTTGGCAGCA GCTTATCCGG CAGCATTTGC CCTGATGCTG CCCTTGCCCG 
TACTGATGTT TTTCCGTCCG . . 

This corresponds to the amino acid sequence <SEQ ID 594; ORF141>: 

1 ..D^GISPVYLW VAAAFKHLLS PWAADSYDVA RFAGVFFAVI GLTSCGFAGF 
51 NFLGRHHGRX WLILIGCIG LIPVAHFLNP AAAAFAAAGL VLHGYSLARR 
101 RVIAASFLLG TGWTLMSLAA AY PAAFALML PLPVLMFFRP .. 

Further work revealed the complete nucleotide sequence <SEQ ID 595>: 

1 ATGCTGACCT ATACCCCGCC CGATGCCCGC CCGCCCGCCA AAACC CACGA 

51 AAAGCCGTGG CTGCTGCTGT TGATGGCGTT TGCCTGGTTG TGGCCCGGCG 

101 TGTTTTCCCA CGATTTGTGG AATCCTGACG AACCTGCCGT CTATACCGCC 

151 GTCGAAGCAC TGGCAGGCAG CCCCACCCCC TTGGTTGCCC ATCTGTTCGG 

201 TCAAACCGAT TTCGGCATAC CGCCCGTGTA TCTTTGGGTT GCCGCCGCGT 

251 TCAAACATTT GCTGTCGCCG TGGGCTGCCG ACTCATACGA TGCCGCACGC 

301 TTTGCAGGCG TATTTTTTGC CGTTATCGGA CTGACTTCCT GCGGCTTTGC 

351 CGGTTTCAAC TTTTTGGGCA GACACCACGG GCGCAgCGTC GTCCTGATTC 

401 TCATCGGCTG TATCGGGCTG ATTCCAGTTG CCCATTTCCT CAACCCCGCT 

451 GCCGCCGCCT TTGCCGCCGC CGGACTGGTG CTGCACGGTT ATTCTTTGGC 

501 TCGCCGGCGC GTGATTGCCG CCTCTTTTCT GCTCGGTACG GGCTGGACGC 

551 TGATGTCGTT GGCAGCAGCT TATCCGGCAG CATTTGCCCT GATGCTGCCC 

601 TTGCCCGTAC TGATGTTTTT CCGTCCGTGG CAAAGCAGGC GTTTGATGTT 

651 GACGGCAGTC GCCTCACTTG CCTTTGCCCT GCCGCTTATG ACCGTTTACC 

701 CGCTGCTCTT GGCAAAAACG CAGCCCGCGC TGTTCGCGCA ATGGCTCGAC 

751 TATCACGTTT TCGGTACGTT CGGCGGCGTG CGGCACGTTC AGACGGCATT 

801 CAGTTTGTTT TACTATCTGA AAAACCTGCT TTGGTTTGCA TTGCCCGCGC 

851 TGCCGCTGGC GGTTTGGACG GTTTGCCGCA CGCGCCTGTT TTCGACCGAC 

901 TGGGGGATTT TGGGCGTCGT CTGGATGCTT GCCGTTTTGG TGCTGCTTGC 

951 CGTCAATCCG CAGCGTTTTC AGGATAACCT CGTCTGGCTG CTTCCGCCGC 

1001 TTGCCCTGTT CGGCGCGGCG CAACTGGACA GCCTGAGGCG CGGCGCGGCG 

1051 GCGTTTGTCA ACTGGTTCGG CATTATGGCG TTCGGACTGT TTGCCGTGTT 

1101 CCTGTGGACG GGCTTTTTCG CCATGAATTA CGGCTGGCCC GCCAAGCTTG 

1151 CCGAACGCGC CGCCTATTTC AGCCCGTATT ATGTTCCTGA TATCGATCCC 

1201 ATTCCGATGG CGGTTGCCGT ACTGTTCACA CCCTTGTGGC TGTGGGCGAT 

1251 TACCCGGAAA AACAT ACG CG GCAGGCAGGC GGTTACCAAC TGGGCGGCAG 

1301 GCGTTACCCT GACCTGGGCT TTGCTGATGA CGCTGTTCCT GCCGTGGCTG 

1351 GACGCGGCGA AAAGCCACGC GCCGGTCGTC CGGAGTATGG AGGCATCGCT 

14 01 TTCCCCGGAA TTGAAACGGG AGCTTTCAGA CGGCATCGAG TGTATCGGCA 

14 51 TAGGCGGCGG CGACCTGCAC ACGCGGATTG TTTGGACGCA GTACGGCACA 

1501 TTGCCGCACC GCGTCGGCGA TGTACAATGC CGCTACCGCA TCGTCCTCCT 

1551 GCCCCAAAAT GCGGATGCGC CGCAAGGCTG GCAGACGGTT TGGCAGGGTG 

1601 CGCGTCCGCG CAACAAAGAC AGTAAGTTCG CACTGATACG GAAAATCGGG 

1651 GAAAATATAT AA 

This corresponds to the amino acid sequence <SEQ ID 596; ORF141-l>: 

1 MLTYTPPDAR PPAKTHEKPW LLLLMAFAWL WPGVFS HDLW NPDEPAVYTA 

51 VEALAGSPTP LVAHLFGQTD FGIPPVYLWV AAAFKHLLSP WAADSYDAAR 

101 FAGVFFAVIG LTSCGFA GFN FLGRHHGRS V VLILIGCIGL I PVAHF LN PA 

151 AAAFAAAGLV LHGYSLARRR VIAASFLLGT GWTLMSL AAA YPAAFALMLP 

201 LPVLMFF RPW QSRRL MLTAV AS LAFAL PLM TV YPLLLAKT QPALFAQWLD 

251 YHVFGT FGGV RHVQTAFSLF YYLKNLLWFA LPALPLAVWT VCRTRLFSTD 

301 W GILGWWML AVLVLLAVN P QRFQDNLVWL LPPLALFGAA QLDSLRRGAA 

351 AFVNWFGIMA FGLFAVFLWT GFFAMNYGWP AKLAERAAYF SPYYVPDIDP 

4 01 IPMAVAVLFT PLWLWAI TRK NIRGRQAVTN WAAGVTLTWA LLMTLFL PWL 

4 51 DAAKSHAPW RSMEASLSPE LKRELSDGIE CIGIGGGDLH TRIVWTQYGT 

501 LPHRVGDVQC RYRIVLLPQN ADAPQGWQTV WQGARPRNKD SKFALIRKIG 

551 ENI* 



101 
151 
201 
251 
301 
351 
401 



Computer analysis of this amino acid sequence gave the following results: 
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Homology with a predicted QRF from N.meningitidis (strain A) 

ORF141 shows 95.0% identity over a 140aa overlap with an ORF (ORF141a) from strain A of N. 
meningitidis: 

10 20 30 

orfl41 Pep D FG I S PVYLWVAAAFKHLL S P WAAD S Y DVA 

111! I I I I I I I I I I I I I I I I I I I I I I : I 
orfl41a WNP DE PAVYTAVEALAGS PT PLVAHLFGQI DFGI PPVYLWVAAAFKHLL S PWAAD PYDAA 

40 50 60 70 80 90 

40 50 60 70 80 90 

orfl41 pep RFAGVFFA VIGLTSCGFA GFNFLGRHHGRX WLILIGCIGLIPVAHF LNPAAAAFAAAGL 

| || | | i | | I : I I I I II II I I II I I I I I II Mill II II II II:: I II I Mil 

orfl41a R FAGVFFAWGLTSCGFA GFNFLGRHHGRS WLILIGCIGLIPTVHF LNPAAAAFAAAGL 
100 110 120 130 140 150 

100 110 120 130 140 

or f 141 pep VLHGYSLARRR VIAASFLLGTGWTLMSL AA AYPAAFALMLPLPVLMFF RP 
I M II I I I I I I I I I I I I I I I I I M I M I I II I I I I II I I M II II II II I 
orfl41a VLHGYSLARRR VIAASFLLGTGWTLMSLA A AYPAAFALMLPLPVLMFF RPWQSRRLMLTA 
160 170 180 190 200 210 

orfl41a VASLAFALPLMTV YPLLLAKTQPALFAQWLDDHVFGTFGGVRHIQTAFSLFYYLKNLLWF 
220 230 240 250 260 270 

The complete length ORF141 a nucleotide sequence <SEQ ID 597> is: 

1 ATGCTGACCT ATACCCCGCC CGATGCCCGC CCGCCCGCCA AAACC CACGA 

51 AAAGCCGTGG CTGTTGCTGT TGATGGCGTT TGCCTGGTTG TGGCCCGGCG 

101 TGTTTTCCCA CGATTTGTGG AATCCTGACG AACCTGCCGT CTATACCGCC 

151 GTCGAAGCAC TGGCAGGCAG CCCCACCCCT TTGGTTGCCC ATCTGTTCGG 

201 TCAAATCGAT TTCGGCATAC CGCCCGTGTA TCTTTGGGTT GCCGCCGCGT 

251 TCAAACATTT GCTGTCGCCG TGGGCTGCCG ACCCGTATGA TGCCGCACGC 

301 TTTGCCGGCG TGTTTTTCGC CGTTGTCGGA CTGACTTCCT GCGGCTTTGC 

351 CGGTTTCAAC TTTTTGGGCA GACACCACGG GCGCAGCGTC GTCCTGATTC 

4 01 TCATCGGCTG TATCGGGCTG ATTCCGACCG TACACTTTCT CAACCCCGCT 

451 GCCGCCGCCT TTGCCGCCGC CGGACTGGTG CTGCACGGTT ATTCTTTGGC 

501 TCGCCGGCGC GTGATTGCCG CCTCTTTTCT GCTCGGTACG GGTTGGACGC 

551 TGATGTCGTT GGCAGCAGCT TATCCGGCGG CATTTGCCCT GATGCTGCCC 

601 CTGCCCGTGC TGATGTTTTT CCGTCCGTGG CAAAGCAGGC GTTTGATGTT 

651 GACGGCAGTC GCCTCGCTTG CCTTTGCCCT GCCGCTTATG ACCGTTTACC 

7 01 CGCTGCTCTT GGCAAAAACG CAGCCCGCGC TGTTCGCGCA ATGGCTCGAC 

751 GATCACGTTT TCGGTACGTT CGGCGGCGTG CGGCACATTC AGACGGCATT 

801 CAGTTTGTTT TACTATCTGA AAAACCTGCT TTGGTTTGCA TTGCCTGCGC 

851 TGCCGCTGGC GGTTTGGACG GTTTGCCGCA CGCGCCTGTT TTCGACCGAC 

901 TGGGGGATTT TGGGCGTCGT CTGGATGCTT GCCGTTTTGG TGCTGCTTGC 

951 CGTCAATCCG CAGCGTTTTC AGGATAACCT CGTCTGGCTG CTTCCGCCGC 

10 01 TTGCCCTGTT CGGCGCGGCG CAACTGGACA GCCTGAGACG CGGCGCGGCG 

1051 GCGTTTGTCA ACTGGTTCGG CATTATGGCG TTCGGACTGT TTGCCGTGTT 

1101 CCTGTGGACG GGCTTTTTCG CCATGAATTA CGGCTGGCCC GCCAAGCTTG 

1151 CCGAACGCGC CGCCTATTTC AGCCCGTATT ATGTTCCTGA TATCGATCCC 

12 01 ATTCCGATGG CGGTTGCCGT ACTGTTCACA CCCTTGTGGC TGTGGGCGAT 
1251 TACCCGCAAA AACATACGCG GCAGGCAGGC GGTTACCAAC TGGGCGGCAG 

13 01 GCGTTACCCT GACCTGGGCT TTGCTGATGA CGCTGTTCCT GCCGTGGCTG 
1351 GACGCGGCGA AAAGCCACGC GCCCGTCGTC CGGAGTATGG AGGCATCGCT 

14 01 TTCCCCGGAA TTAAAACGGG AGCTTTCAGA CGGCATCGAG TGTATCGACA 
14 51 TAGGCGGCGG CGACCTACAC ACGCGGATTG TTTGGACGCA GTACGGCACA 
1501 TTGCCGCACC GCGTCGGCGA TGTACAATGC CGCTACCGCA TCGTCCGCTT 
1551 GCCCCAAAAC GCGGATGCGC CGCAAGGCTG GCAGACGGTC TGGCAGGGTG 
1601 CGCGCCCGCG CAACAAAGAC AGTAAGTTCG CACTGATACG GAAAACCGGG 
1651 GAAAATATAT TAAAAACAAC AGATTGA 

This encodes a protein having amino acid sequence <SEQ ID 598>: 

1 MLTYTPPDAR PPAKTHEKPW LLLLMAFAWL WPGVFS HDLW NPDEPAVYTA 
51 VEALAGSPTP LVAHLFGQID FGIPPVYLWV AAAFKHLLSP WAADPYDAAR 
101 FAGVFFAWG LTSCGFA GFN FLGRHHGRS V VLILIGCIGL IPTVHF LNPA 
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151 AAAFAAAGLV LHGYSLARRR VIAASFLLGT 

201 LPVLMFF RPW OSRRL MLTAV ASLAFALPLM 

251 DHVFGTFGGV RHIQTAFSLF YYLKNLLWFA 

3 01 W GILGWWML AVLVLLAV NP QRFQDNLVWL 
351 AFVNWFGIMA FGLFAVFLWT GFFAMNYGWP 

4 01 IPMAVAVLFT PLWLWAI TRK NIRGRQAVTN 
4 51 DAAKSHAPW RSMEASLSPE LKRELSDGIE 
501 LPHRVGDVQC RYRIVRLPQN ADAPQGWQTV 
551 ENILKTTD* 



GWTLMSLA AA YPAAFALMLP 
QPALFAQWLD 
VCRTRLFSTD 
QLDSLRRGAA 
SPYYVPDIDP 
LLMTLFLPWL 



TVYPLLLAKT 
LPALPLAVWT 
LPPLALFGAA 
AKLAERAAYF 
WAAGVTLTWA 



CIDIGGGDLH ' 
WQGARPRNKD ; 



10 ORF141a and ORF141-1 show 98.2% identity in 553 aa overlap: 



orfl41a pep MLTYTPPDARPPAKTHEKPWLLLLMAFAWLWPGVFSHDLWNPDEPAVYTAVEALAGSPTP 

I I I I | | | 1 I 1 I M I I I I I I I I I I I I I I I I M I 1 I I I I I I I i I I I I I I I I I M I I I 

orfl41-l MLTYTPPDARPPAKTHEKPWLLLLMAFAWLWPGVFSHDLWNPDEPAVYTAVEALAGSPTP 

orf 141a pep LVAHLFGQIDFGIPPVYLWVAAAFKHLLSPWAADPYDAARFAGVFFAWGLTSCGFAGFN 
| | ! I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I : I I I I I I I M I I 
orf 14 1-1 LVAHLFGQTDFGIPPVYLWVAAAFKHLLSPWAADSYDAARFAGVFFAVIGLTSCGFAGFN 



orfl41a.pep 



FLGRHHGRSVVLILIGCIGLIPTVHFLNPAAAAFAAAGLVLHGYSLARRRVIAASFLLGT 

I | | | | i | | M I I I I II II I I M : : I I I I I I I I I I I I I M II II II M II II I I I I I I I I I 
orf 141-1 FLGRHHGRSVVLILIGCIGLIPVAHFLNPAAAAFAAAGLVLHGYSLARRRVIAAS FLLGT 

orf 14 la pep GWTLMSLAAAYPAAFALMLPLPVLMFFRPWQSRRLMLTAVASLAFALPLMTVYPLLLAKT 

II | I I I I I I I I I M I I I II II I I II I II I II I i I I I I I I I I I I I I M 1 I I M 11 I I M II 
orf 14 1-1 GWTLMSLAAAYPAAFALMLPLPVLMFFRPWQSRRLMLTAVAS LAFALPLMTVYPLLLAKT 

orf 14 la pep QPALFAQWLDDHVFGTFGGVRHIQTAFSLFYYLKNLLWFALPALPLAVWTVCRTRLFSTD 
| | I I I I I 1 I I I I I I I I I I I I I : I I I I I I I I I I I I I I I M I II I I I I M I I I M I I I I I 1 
orf 14 1-1 QPALFAQWLDYHVFGTFGGVRHVQTAFSLFYYLKNLLWFALPALPLAVWTVCRTRLFSTD 

orf 14 la pep WGILGVVWMLAVLVLLAVNPQRFQDNLVWLLPPLALFGAAQLDSLRRGAAAFVNWFGIMA 
II | I I I I I I I II II I I I I I II I I I II I I I M I I I I I I I I I I I I I I I I I I I I I I I M I I I I 
orf 14 1-1 WGILGVVWMLAVLVLLAVNPQRFQDNLVWLLPPLALFGAAQLDSLRRGAAAFVNWFGIMA 

orf 14 la pep FGLFAVFLWTGFFAMNYGWPAKLAERAAYFSPYYVPD I DPI PMAVAVL FT PLWLWAITRK 

I I I I I I I I I I I 1 I I I I I I I I I I I I I 1 I I I I I I I M II I I I I I i I I I I I I I I I I I I I I I I I 
orf 141-1 FGLFAVFLWT GFFAMNYGWPAKLAERAAYFSPYYVPD I DPI PMAVAVLFT PLWLWAITRK 

orfl41a.pep NI RGRQAVTNWAAGVT LT WALLMTL FL PWLDAAKSHAPWRSME AS LSPE LKRE LSDGIE 

orf 14 1-1 NIRGRQAVTNWAAGVTLTWALLMTLFLPWLDAAKSHAPWRSMEASLSPE LKRELSDGIE 

orf 141a. pep CIDIGGGDLHTRIVWTQYGTLPHRVGDVQCRYRIVRLPQNADAPQGWQTVWQGARPRNKD 

II I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 1 I 1 I I I I I ! I I I I I I I I I II I I 
orf 141-1 CIGIGGGDLHTRIVWTQYGTLPHRVGDVQCRYRIVLLPQNADAPQGWQTVWQGARPRNKD 



50 



SKFALIRKTGENI 
I I i I I I I I I I I I 
SKFALIRKIGENI 



orf 141a. pep 
orfl41-l 

Homology with a predicted ORF from N.gonorrhoeae 

ORF141 shows 95% identity over a 140aa overlap with a predicted ORF (ORF141ng) from 
N.gonorrhoeae: 

DFGI S PVYLWVAAAFKHLL S PWAAD S YDVA 30 
I I I I I I I I I I I I I I I I I I I I I I I I I : I 
WNPAE PAVYTAVEALAGS PT PLVAHLFGQTDFGI PPVYLWVAAAFKHLLSPWAAHPYDAA 



orfl41.pep 

orfl41ng 
orfl41 .pep 
orfl41ng 
orf 141 . pep 
orf 141ng 



RFAGVFFAVIGLTSCGFAGFNFLGRHHGRXWLILIGCIGLIPVAHFLNPAAAAFAAAGL 
I I I I 1 I I II M I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I 
RFAGVFFAVIGLTSCGFAGFNFLGRHHGRSWLIHIGCIGLIPVAHFFNPAAAAFAAAGL 

VLHGYSLARRRVIAASFLLGTGWTLMSLAAAYPAAFALMLPLPVLMFFRP 
I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
VLHGYSLARRRVIAASFLLGTGWTLMSLAAAYPAAFALMLPLPVLMFFRPWQSRRLMLTA 



126 
90 
186 
140 
246 
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An ORF141ng nucleotide sequence <SEQ ID 599> was predicted to encode a protein having amino 
acid sequence <SEQ ID 600>: 

1 MPSEAVSARP LCEYLLHLAI RPFLLTLMXT YTPPDARPPA KTHEKPWLLL 
51 LMAFAWLWPG VFS HDLWNPA EPAVYTAVEA LAGSPTPLVA HLFGQTDFGI 
101 PPVYLWVAAA FKHLLSPWAA HPYDAA RFAG VFFAVIGLTS CGFA GFNFLG 
151 RHHGRS WLI HIGCIGLIPV AHF FNPAAAA FAAAGLVLHG YSLARRRVIA 
2 01 AS FLLGTGWT LMSL AA AYPA AFALMLPLPV LMFF RPWQSR RL MLTAVAS L 
251 AFALPLMTV Y PLLLAKTQPA LFAQWLNYHV FGTFGGVRHI QRAFSLFHYL 
301 KNLLWFAPPG LPLAVWTVCR TRLFSTD WGI LGIVWMLAVL VLLAF HPQRF 
351 QDNLVWLLPP LALFGAAQLD SLRRGAAAFV NWFG IMAFGL FAVFLWTGFF 
401 AMNYGWPAKL AERAAYFSPY YVPDIDP IPM AVAVLFTPLW LWAI TRKNIR 
4 51 GRQAVTN WAA GVT LTWALLM TLFL PWLDAA KSHAPWRSM EASFSPELKR 
501 ELSDGIECIG IGGGDLHTRI VWTQYGTLPH RVGDVRCRYR IVRLPQNADA 
551 PQGWQTVWQG ARPRNKDSKF ALIRKIGENI LKTTD* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 60 1>: 

1 ATGCTGACCT ATACCCCGCC CGATGCCCGC CCGCCCGCCA AAACCCACGA 

51 AAAACCGTGG CTGCTGCTGT TGATGGCGTT TGCCTGGCTG TGGCCCGGCG 
101 TGTTTTCCCA CGATTTGTGG AATCCTGCCG AACCTGCCGT CTATACCGCC 
151 GTCGAAGCAC TGGCAGGCAG CCCCACCCCC TTGGTTGCCC ATCTGTTCGG 
201 TCAAACCGAT TTCGGCATAC CGCCCGTGTA TCTTTGGGTT GCCGCCGCAT 
251 TCAAACATTT GCTGTCGCCG TGGGCAGCCG ACCCGTATGA TGCCGCACGC 
301 TTTGCAGGCG TATTTTTTGC CGTTATCGGA CTGACTTCTT GCGGCTTTGC 
351 CGGTTTCAAC TTTTTGGGCA GACACCACGG GCGCAGCGTT GTTTTAATCC 

401 ATATCGGCTG TATCGGGCTG ATTCCGGTTG CCCATTTCCT CAATCCcgcc 
4 51 gccgccgcct tTGCCGCCGC CGGACTGGTG CTGCacggct actcgctgGC 

501 ACGCCGGCGC GTGATtgccg cctctTtccT GCTCGGTACG GGTTGGACGT 
551 TGATGTCGCT GGCGGCAGCT TATCCGGCGG CGTTTGCGCT GATGCTGCCC 
601 CTGCCCGTGC TGATGTTTTT CCGTCCGTGG CAAAGCAGGC GTTTGATGTT 

651 GACGGCAGTC GCCTCGCTTG CCTTTGCCCT GCCGCTTATG ACCGTTTACC 

701 CGCTGCTCtt gGCAAAAACG CAGCCCGCGC TGTTTGCGCA ATGGCTCAAC 

751 TATCACGTTT TCGGTACGTt cggcgGCGTG CGGCAcaTTC AGAggGCatT 

801 Cagtttgttt cactatctgA AAaatctgct ttggttcgca ccgcccgggC 

851 TGCCGCTGGC GGTTTGGACG GTTTGCCGCA CACGCCTGTT TTCGACCGAC 

901 TGGGGGATTT TGGGCATTGT CTGGATGCTT GCCGTTTTGG TGCTGCTCGC 

951 CTTTAATCCG CAGCGTTTTC AAGACAACCT CGTCTGGCTG CTGCCGCCGC 

1001 TTGCCCTGTT CGGCGCGGCG CAACTGGACA GCCTGAGGCG CGGCGCGGCG 

1051 GCTXTTGTCA ACTGGTTCGG CATTATGGCG TTCGGGCTGT TTGCCGTGTT 

1101 CCTGTGGACG GGCTTTTTCG CCATGAATTA CGGCTGGCCC GCCAAGCTTG 

1151 CCGAACGCGC CGCCTACTTC AGCCCGTATT ACGTTCCCGA CATCGATCCC 

1201 ATTCCGATGG CGGTTGCCGT ACTGTTCACA CCCTTGTGGC TGTGGGCGAT 

1251 TACCCGGAAA AACATACGCG GCAGGCAGGC GGTTACCAAC TGGGCGGCAG 

1301 GCGTTACCCT GACCTGGGCT TTGCTGATGA CGCTGTTCCT GCCGTGGCTG 

1351 GACGCGGCGA AAAGCCACGC GCCCGTCGTC CGGAGTATGG AGGCATCGTT 

14 01 TTCCCCGGAA TTAAAACGGG AGCTTTCAGA CGGCATCGAG TGTATCGGCA 

1451 TAGGCGGCGG CGACCTGCAC ACGCGGATTG TTTGGACGCA GTACGGCACA 

1501 TTGCCGCACC GCGTCGGCGA TGTCCGTTGC CGCTACCGTA TCGTCCGCCT 

1551 GCCCCAAAAC GCGGATGCGC CGCAAGGCTG GCAGACGGTC TGGCAGGGTG 

1601 CGCGCCCGCG CAACAAAGAC AGTAAGTTTG CACTGATACG GAAAATCGGG 

1651 GAAAATATAT TAAAAACAAC AGATTGA 

This corresponds to the amino acid sequence <SEQ ID 602; ORF141ng-l>: 

1 MLTYTPPDAR PPAKTHSKPW LLLLMAFAWL WPGVFS HDLW NPAEPAVYTA 

51 VEALAGSPTP LVAHLFGQTD FGIPPVYLWV AAAFKHLLSP WAADPYDAAR 

101 FAGVFFAVIG LTSCGFA GFN FLGRHHGRS V VLIHIGCIGL IPVAHF LNPA 

151 AAAFAAAGLV LHGYSLARRR VIAASFLLGT GWTLMSLA AA YPAAFALMLP 

201 LPVLMFF RPW QSRRL MLTAV ASLAFALPLM TV YPLLLAKT QPALFAQWLN 

251 YHVFGTFGGV RHIQRAFSLF HYLKNLLWFA PPGLPLAVWT VCRTRLFSTD 

301 W GILGIVWML AVLVLLAF NP QRFQDNLVWL LPPLALFGAA QLDSLRRGAA 

351 AFVNWFG IMA FGLFAVFLWT GFFA MNYGWP AKLAERAAYF SPYYVPDIDP 

401 IPMAVAVLFT PLWLWAI TRK NIRGRQAVTN WAAGVTLTWA LLMTLFL PWL 

451 DAAKSHAPW RSMEASFSPE LKRELSDGIE CIGIGGGDLH TRIVWTQYGT 

501 LPHRVGDVRC RYRIVRLPQN ADAPQGWQTV WQGARPRNKD SKFALIRKIG 

551 ENILKTTD* 
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ORF141ng-l and ORF141-1 show 97.5% identity in 553 aa overlap: 

nrf1 4i na _i DeD MI/rY TPPDARPPAKTHEKPWLLLLMAFAWLWPGVFSHDLWNPAEPAVYTAVEALAGSPTP 

orfl41ng | | | | | | | | | | | | || | I I I I I II I I I I I I I I I I I I I I I Ill II I II INI 

or f 141-1 MLT y T ppDARPPAKTHEKPWLLLLMAFAWLWPGVFSHDLWNPDEPAVYTAVEALAGSPTP 

orfl41na-l nep LVAHLFGQTDFGIPPVYLWVAAAFKHLLSPWAADPYDAARFAGVFFAVIGLTSCGFAGFN 
g " P P MlllM || II Hill ill II I I II II II INI I MllllllllllllMIIMIlMI 
orf 141-1 LVAHLFGQTDFGIPPVYLWVAAAFKHLLSPWAADSYDAARFAGVFFAVIGLTSCGFAGFN 

orfl41nq-l pep FLGRHHGRSWLIHIGCIGLIPVAHFLNPAAAAFAAAGLVLHGYSLARRRVIAAS FLLGT 

| | | | M II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I 

orf 14 1-1 FLGRHHGRSVVLILIGCIGLIPVAHFLNPAAAAFAAAGLVLHGYSLARRRVIAAS FLLGT 

orfl41ncr-l pep GWTLMSLAAAYPAAFALMLPLPVLMFFRPWQSRRLMLTAVASLAFALPLMTVYPLLLAKT 

| | | | | | | | I I I I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I II I I 

orf 14 1-1 GWTLMSLAAAYPAAFALMLPLPVLMFFRPWQSRRLMLTAVASLAFALPLMTVYPLLLAKT 

orfl41nq-l pep QPALFAQWLNYHVFGTFGGVRHIQRAFSLFHYLKNLLWFAPPGLPLAVWTVCRTRLFSTD 
| | | | | | | I I : II II I II I I I I I : I I I I I I : I I I I I I I I I I : I I I M M I I I I I I M I I 
orf 14 1-1 QPALFAQWLDYHVFGTFGGVRHVQTAFSLFYYLKNLLWFALPALPLAVWTVCRTRLFSTD 

orf 141ng-l pep WG I LG I VWML AVLVLLAFN PQRFQDNLVWLL PPLALFGAAQL DS LRRGAAAFVNW FG IMA 

|| | | | : M I I I I I I I M I I I M I I I M I I I I M M M II I I I I I I I I I I I I I M I I I I I 

orf 141-1 WGILGWWMLAVLVLLAVNPQRFQDNLVWLLPPLALFGAAQLDSLRRGAAAFVNWFGIMA 

orfl41ng-l.pep FGLFAVFLWTGFFAMNYGWPAKLAERAAYFSPYYVPDIDPIPMAVAVLFTPLWLWAITRK 
I | | | | I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I I I I I 1 I I M I I I I II I I I M I 
orf 14 1-1 FGLFAVFLWTGFFAMNYGWPAKLAERAAYFSPYYVPDIDPIPMAVAVLFTPLWLWAITRK 

orf 141ng-l pep NIRGRQAVTNWAAGVTLTWALLMTLFLPWLDAAKSHAPWRSMEASFSPELKRELSDGIE 
I | | || I I I I I I I I I I I I I I II I I I I I I I I I 1 I 1 II I I II I I I I I I I : M M I I I I II M I 
orf 14 1-1 NIRGRQAVTNWAAGVTLTWALLMTLFLPWLDAAKSHAPWRSMEASLSPELKRELSDGIE 

orf 141ng-l . pep CIGIGGGDLHTRIVWTQYGTLPHRVGDVRCRYRIVRLPQNADAPQGWQTVWQGARPRNKD 

I I I I I I I I II II I I M I I I I I I II I I I I : I I I I I ! I I I I I I I I I I I I I I I I II II I I I I 
orf 14 1-1 CIGIGGGDLHTRIVWTQYGTLPHRVGDVQCRYRIVLLPQNADAPQGWQTVWQGARPRNKD 

orfl41ng-l.pep SKFALIRKIGENILKTTDX 

orfl41-l SKFALIRKIGENIX 

Based on the presence of several putative transmembrane domains in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 72 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 603>: 

1 . . CAATCCGCCA AATGGTTATC GGGCCAAACT CTAGTCGGCA CAGCAATTGG 

51 GATACGCGGG CAGATAAAGC TTGGCGGCAA CCTGCATTAC GATATATTTA 

101 CCGGCCGCGC AT T G AAAAAG CCCGAATTTT TCCAATCAAG GAAATGGGCA 

151 AGCGGTTTTC AGGTAGGCTA TACGTTTTAA 



50 This corresponds to the amino acid sequence <SEQ ID 604; ORF142>: 



. QSAKWLSGQT LVGTAIGIRG QIKLGGNLHY DIFTGRALKK PEFFQSRKWA 
SGFQVGYTF* 



Further work revealed the complete nucleotide sequence <SEQ ID 605>: 
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1 ATGGATAATT CGGGTAGTGA GGCGACAGGA AAATACCAAG GAAATATCAC 
51 TTTCTCTGCC GACAATCCTT TGGGACTGAG TGATATGTTC TATGTAAATT 
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101 ATGGACGTTC GATTGGCGGT ACGCCCGATG AGGAAAGTTT TGACGGCCAT 

151 CGCAAAGAAG GCGGATCAAA CAATTACGCC GTACATTATT CAGCCCCTTT 

201 CGGTAAATGG ACATGGGCAT TCAATCACAA TGGCTACCGT TACCATCAGG 

251 CAGTTTCCGG ATTATCGGAA GTCTATGACT AT AAT GGAAA AAGTTACAAT 

301 ACTGATTTCG GCTTCAACCG CCTGTTGTAT CGTGATGCCA AACGCAAAAC 

351 CTATCTCGGT GTAAAACTGT GGATGAGGGA AACAAAAAGT TACATTGATG 

4 01 ATGCCGAACT GACTGTACAA CGGCGTAAAA CTGCGGGTTG GTTGGCAGAA 

4 51 CTTTCCCACA AAGAATATAT CGGTCGCAGT ACGGCAGATT TTAAGTTGAA 

501 ATATAAACGC GGCACCGGCA TGAAAGATGC TCTGCGCGCG CCTGAAGAAG 

551 CCTTTGGCGA AGGCACGTCA CGT AT GAAAA TTTGGACGGC ATCGGCTGAT 

601 GTAAATACTC CTTTTCAAAT CGGTAAACAG CTATTTGCCT AT G AC AC AT C 

651 CGTTCATGCA CAATGGAACA AAACCCCGCT AACATCGCAA GACAAACTGG 

701 CTATCGGCGG ACACCACACC GTACGTGGCT TCGACGGTGA AATGAGTTTG 

7 51 TCTGCCGAGC GGGGATGGTA TTGGCGCAAC GATTTGAGCT GGCAATTTAA 

801 ACCAGGCCAT CAGCTTTATC TTGGGGCTGA TGTAGGACAT GTTTCAGGAC 

851 AATCCGCCAA ATGGTTATCG GGCCAAACTC TAGTCGGCAC AGCAAT TGGG 

901 ATACGCGGGC AGATAAAGCT TGGCGGCAAC CTGCATTACG ATATATTTAC 

951 CGGCCGCGCA TTGAAAAAGC CCGAATTTTT CCAATCAAGG AAATGGGCAA 

1001 GCGGTTTTCA GGTAGGCTAT ACGTTTTAA 

This corresponds to the amino acid sequence <SEQ ID 606; ORF142-l>: 

1 MDNSGSEATG KYQGNITFSA DNPLGLSDMF YVNYGRSIGG TPDEESFDGH 

51 RKEGGSNNYA VHYSAPFGKW TWAFNHNGYR YHQAVSGLSE VYDYNGKSYN 

101 TDFGFNRLLY RDAKRKTYLG VKLWMRETKS YIDDAELTVQ RRKTAGWLAE 

151 LSHKEYIGRS TADFKLKYKR GTGMKDALRA PEEAFGEGTS RMKIWTASAD 

201 VNTPFQIGKQ LFAYDTSVHA QWNKTPLTSQ DKLAIGGHHT VRGFDGEMSL 

251 SAERGWYWRN DLSWQFKPGH QLYLGADVGH VSGQSAKWLS GQTLVGTAIG 

301 IRGQIKLGGN LHYDIFTGRA LKKPEFFQSR KWASGFQVG Y TF * 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. gonorrhoeae 

ORF142 shows 88.1% identity over a 59aa overlap with a predicted ORF (ORP142ng) from 
N. gonorrhoeae: 

orfl42 pep QSAKWLSGQTLVGTAIGIRGQIKLGGNLHY 30 

I I II I I I I I I I = I I I I I I I I I I I I I I I I I I 
orfl42ng RGWYWRNDLSWQFKPGHQLYLGADVGHVSGQSAKWLSGQTLAGTAIGIRGQIKLGGNLHY 313 

orfl42.pep DIFTGRALKKPEFFQSRKWASGFQVGYTF 59 

I I II I II I I I I I : I I : : I I : : 1 II I I I : I 
orfl42ng D I FT GRALKKPE Y FQT KKWVT G FQVG Y S F 342 

The complete length ORF142ng nucleotide sequence <SEQ ID 607> is: 

1 ATGGATAATT CGGGTAGTGA GGCGACAGGA AAATACCAAG GAAAT AT C AC 

51 TTTCTCTGCC GACAATCCTT TTGGACTGAG TGATATGTTC TATGTAAATT 

101 ATGGACGTTC AATTGGCGGT ACGCCCGATG AGGAAAATTT TGACGGCCAT 

151 CGCAAAGAAG GCGGATCAAA CAATTACGCC GTACATTATT CAGCCCCTTT 

201 CGGTAAATGG ACATGGGCAT TCAATCACAA TGGCTACCGT TACCATCAGG 

251 CGGTTTCCGG ATTATCGGAA GTCTATGACT AT AAT GGAAA AAGTTACAAC 

301 ACTGATTTCG GCTTCAACCG CCTGTTGTAT CGTGATGCCA AACGCAAAAC 

351 CTATCTCAGT GTAAAACTGT GGACGAGGGA AACAAAAAGT TACATTGATG 

4 01 ATGCCGAACT GACTGTACAA CGGCGTAAAA CCACAGGTTG GTTGGCAGAA 

4 51 CTTTCCCACA AAGGATATAT CGGTCGCAGT ACGGCAGATT TTAAGTTGAA 

501 ATATAAACAC GGCACCGGCA TGAAAGATGC TCTGCGCGCG CCTGAAGAAG 

551 CCTTTGGCGA AGGCACGTCA C GT AT GAAAA TTTGGACGGC ATCGGCTGAT 

601 GTAAATACTC CTTTTCAAAT CGGTAAACAG CTATTTGCCT ATGACACATC 

651 CGTTCATGCA CAATGGAACA AAACCCCGCT AACATCGCAA GACAAACTGG 

7 01 CTATCGGCGG ACACCACACC GTACGTGGCT TCGACGGTGA AATGAGTTTG 

7 51 CCTGCCGAGC GGGGATGGTA TTGGCGCAAC GATTTGAGCT GGCAATTTAA 
801 ACCAGGCCAT CAGCTTTATC TTGGGGCTGA TGTAGGACAT GTTTCAGGAC 

8 51 AATCCGCCAA ATGGTTATCG GGCCAAACTC TAGCCGGCAC AGCAATTGGG 
901 ATACGCGGGC AGATAAAGCT TGGCGGCAAC CTGCATTACG ATATATTTAC 
951 CGGCCGTGCA TTGAAAAAGC CCGAATATTT TCAGACGAAG AAATGGGTAA 
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1001 CGGGGTTTCA GGTGGGTTAT TCGTTTTGA 

This encodes a protein having amino acid sequence <SEQ ID 608>: 

1 MDNSGSEATG KYQGNITFSA DNPFGLSDMF YVNYGRS I GG TPDEENFDGH 

51 RKEGGSNNYA VHYSAPFGKW TWAFNHNGYR YHQAVSGLSE VYDYNGKSYN 

5 101 TDFGFNRLLY RDAKRKTYLS VKLWTRETKS YIDDAELTVQ RRKTTGWLAE 

151 LSHKGYIGRS TAD FKLKYKH GTGMKDALRA PEEAFGEGTS RMKIWTASAD 

201 VNTPFQIGKQ LFAYDTSVHA QWNKTPLTSQ DKLAIGGHHT VRGFDGEMSL 

2 51 PAERGWYWRN DLSWQFKPGH QLYLGADVGH VSGQSAKWLS GQTLAGTAIG 

301 IRGQIKLGGN LHYDIFTGRA LKKPEYFQTK KWVTGFQVG Y SF * 

10 The underlined sequence (aromatic-Xaa-aromatic amino acid motif) is usually found at the 
C-terminal end of outer membrane proteins. 



ORF142ng and ORF142-1 show 95.6% identity over 342aa overlap: 

orf 142-1. pep MDNSGSEATGKYQGNITFSADNPLGLSDMFYVNYGRSIGGTPDEESFDGHRKEGGSNNYA 
I I I I 1 I I I I I II I I I I I I M I I I : I I I I M I M I I I I I I I I I M I : II M I I I I I I I I I I 
orfl42ng-l MDNSGSEATGKYQGN I T FSADNPFGLSDMFYVNYGRS IGGT PDEENFDGHRKEGGSNNYA 

orf 142-1 . pep VHYSAPFGKWTWAFNHNGYRYHQAVSGLSEVYDYNGKSYNTDFGFNRLLYRDAKRKTYLG 
M I I I I I I II I I II I I I I I I II I II I I II I 1 I I I II II I I II 11 I I I I I I I I M I I I I I : 
orf 142ng-l VHYSAPFGKWTWAFNHNGYRYHQAVSGLSEVYDYNGKSYNTDFGFNRLLYRDAKRKTYLS 

orf 142-1 . pep VKLWMRETKSY I DDAELTVQRRKTAGWLAELSHKEYIGRSTADFKLKYKR GTGMKDALRA 
INI I I II I I I I I I I I I I I I I I I : I I I M I II I I I I I I I I I II II I I : M II f I N I I 
orfl42ng-l VKLWTRETKS YIDDAELTVQRRKTTGWLAELSHKGYIGRSTADFKLKYKHGTGMKDALRA 

orf 142-1 .pep PEEAFGEGTSRMKIWTASADVNTPFQIGKQLFAYDTSVHAQWNKTPLTSQDKLAIGGHHT 

M II I I I I II I I II I I I I II I I I I I I I I I I i I I I I II I I I I I || I I M II I I II I I I I I I 

orfl4 2ng-l PEEAFGEGTSRMKIWTASADVNTPFQIGKQLFAYDTSVHAQWNKTPLTSQDKLAIGGHHT 

or f 1 4 2 - 1 . pep VRGFDGEMSLSAERGWYWRNDLSWQFKPGHQLYLGADVGHVSGQSAKWLSGQTLVGTAIG 
M I I M I I I I I I I I I I I I I I II I I 1 I I I I I I I | | I | | || | | | | | | | | | | | | | | : | M | | 
orfl4 2ng-l VRGFDGEMSLPAERGWYWRNDLSWQFKPGHQLYLGADVGHVSGQSAKWLSGQTLAGTAIG 

orf 142-1 .pep IRGQIKLGGNLHYDIFTGRALKKPEFFQSRKWASGFQVGYTF 

orfl42ng-l IRGQIKLGGNLHYDIFTGRALKKPEYFQTKKWVTGFQVGYSF 

In addition, ORF142ng is homologous to the HecB protein of E.chrysanthemi: 

1772622 (L39897) HecB [Erwinia chrysanthemi ] Length = 558 
119 bits (295), Expect = 3e-26 

3 = 88/346 (25%), Positives = 151/346 (43%), Gaps = 22/346 (6%) 

DNSGSEATGKYQGNITFSADNPFGLSDMFYVNYGRSIGGTPDEENFDGHRKEGGSNNYAV 61 
DNSG ++TG+ Q N + + DN FGL+D ++++ G S + + D + G 
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40 
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2 




Sbjct: 


230 


45 


Query: 


62 




Sbjct : 


281 


50 




122 




Sbjct: 


340 




Query: 


182 


55 


Sbjct: 


400 






242 


60 


Sbjct: 


457 



+R+++RD 



- E + WT SA P Y S++ Q++ L ++L +GG ++ 

3ADEPRAEFNKWTLSASYYHPV— TDSITYLGSLYGQYSARALYGSEQLTLGGESSI 4 56 

rDGEMS LP AERGWYWRN DL S WQFKP GHQLYLGA-DVGHVSGQSAKWLSGQTLAG 296 

r E RG YWRN+L+WQ G+ ++ A D GH+ + +L G 

r -REQYTSGNRGAYWRNELNWQAWQLPVLGNVTFMAAVDGGHLYNHKQDNSTAASLWG 515 
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Query. 297 TAIGIRGQIKLGGNLHYDIFTGRALKKPEYFQTKKWVTGFQVGYSF 342 
" J ' A+G+ + L+G+P+Q V G++VG SF 

Sbjct: 516 GAVGMTVASRW— LSQQVTVGWPISYPAWLQPDTMWGYRVGLSF 558 

On the basis of this analysis, it is predicted that the proteins from N. meningitidis and 
5 N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 
Example 73 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 609>: 

1 ATGCGGACGA AATGGTCAGC AGTGAGAAGC TGCJTACTTG GgCGGACACC 

JO 51 GCCGAC AT CG ATACCGCTTT GAACCTGTTG TACCGTTTGC AAAAACTCGA 

101 ATTCCTCTAT GGCGATGAAA ACGGTCATTC AGACGGCATC AATTTGwCGG 

151 ACGAGCAATT GCCGTTGCTG ATGGAACAAT TGTCCGGCAG CGGTAAGGCG 

201 TTATTGGTCG ATCGGAACGG TCTGTATCTT GCCAACGCCA ATTTCCATCA 

251 TGAGGCGGCG GAAGAGTTGG GGTTGTTGGC GGCAGAAGTC GCACAGATGG 

15 301 AAAAGAAATA CCGGCTGCTG AT T AAGAACA AC.. 

This corresponds to the amino acid sequence <SEQ ID 610; ORF143>: 

1 MRTKWSAVRS CIWADTADID TALNLLYRLQ KLEFLYGDEN GHSDGINLXD 
51 EQLPLLMEQL SGSGKALLVD RNGLYLANAN FHHEAAEELG LLAAEVAQME 
101 KKYRLLIKNN . . 

20 Further work revealed the complete nucleotide sequence <SEQ ID 6 1 1 > : 

1 ATGGAATCAA CACTTTCACT ACAAGCAAAT TTATATCCCC GCCTGACTCC 

51 TGCCGGTGCA TTTTATGCCG TATCCAGCGA TGCCCCCAGT GCCGGTAAAA 

101 CTTTGTTGCA CAGCCTGTTG AAAGCAGATG CGGACGAAAT GGTCAGCAGT 

151 GAGAAGCTGC TTACTTGGGC GGACACCGCC GACAT CGAT A CCGCTTTGAA 

25 201 CCTGTTGTAC CGTTTGCAAA AACTCGAATT CCTCTATGGC GATGAAAACG 

251 GTCATTCAGA CGGCATCAAT TTGTCGGACG AGCAATTGCC GTTGCTGATG 

301 GAACAATTGT CCGGCAGCGG TAAGGCGTTA TTGGTCGATC GGAACGGTCT 

351 GTATCTTGCC AACGCCAATT TCCATCATGA GGCGGCGGAA GAGTTGGGGT 

401 TGTTGGCGGC AGAAGTCGCA CAGAT GGAAA AGAAATACCG GCTGCTGATT 

30 451 AAGAACAACC T GT AT AT C AA CAATAACGCT TGGGGCGTTT GCGATCCTTC 

501 CGGTCAGAGC GAATT GACAT TTTTCCCATT GTATATCGGT TCAACCAAAT 

551 TTATTTTGGT TATCGGCGGC ATTCCCGATT TGGGCAAAGA GGCATTTGTT 

601 ACTTTGGTAA GG AT T T TATA CCGCCGTTAC AGCAACCGCG TGTAA 

This corresponds to the amino acid sequence <SEQ ID 612; ORF143-l>: 

35 1 MESTLSLQAN LYPRLTPAGA FYAVSSDAPS AGKTLLHSLL KADADEMVSS 

51 EKLLIWADTA DIDTALNLLY RLQKLEFLYG DENGHSDGIN LSDEQLPLLM 

101 EQLSGSGKAL LVDRNGLYLA NANFHHEAAE ELGLLAAEVA QMEKKYRLLI 

151 KNNLYINNNA WGVCDPSGQS ELT FFPLYIG STKFILVIGG IPDLGKEAFV 

201 TLVRILYRRY SNRV* 

40 Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted QRF from N meningitidis (strain A) 

ORF143 shows 92.4% identity over a 105aa overlap with an ORF (ORF143a) from strain A of N. 
meningitidis: 

10 20 30 

45 or f 143. pep MRTKWSAVRS CTWADTADIDTALNLLYRLQKLEFL 

I : : III I I I I I I I I I I I I I I I II I I I 
or f 1 4 3 a GAFYAVS S DXPSAGKT LLHS LLKADADEMVSSEKLLTWAXT ADI DTALNLLYRLQKLE FL 
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40 50 60 70 80 90 

nrfl4? Der > yGDENGHSDGINLXDEQLPLLMEQLSGSGKALLVDRNGLYLANANFHHEAAEELGLLAAE 

I I I I I I I I I I I I I I II II M IM MM II M INN I I I I 1 I 1 I 1 I I I I I I I 

orfl43a YGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLANANFHHEAAEELGLLAAE 
5 80 90 100 110 120 130 

100 HO 
orfl43.pep VAQMEKKYRLLIKNN 

in orfl43a vIqmf.KKYRT.XTKNNLYINNNAWGVCDPSGQSELT FFPLYIGSTKFILVIGG IPDLGKEA 

140 150 160 170 180 190 

The complete length ORF143a nucleotide sequence <SEQ ID 613> is: 

1 ATGGAATCAA cantttcact acaagcaaat ttatatcncc gcctgactcc 
51 tgccggtgca ttttatgccg tatccagcga tgnccccagt gccggtaaaa 
15 101 ctttgttgca cagcctgttg aaagcggatg cggacgaaat ggtnagcagt 

151 GAGAAGCTGC TTACCTGGGC GGANACCGCC GACATCGATA CCGCTTTGAA 

201 CCTGTTGTAC CGTTTGCAAA AACTCGAATT CCTCTATGGC GATGAAAACG 

251 GTCATTCAGA CGGCATCAAT TTGTCGGACG AGCAATTGCC GTTGCTGATG 

301 GAACAATTGT CCGGCAGCGG TAAGGCGTTA TTGGTCGATC GGAACGGTCT 

20 351 GTATCTTGCC AACGCCAATT TCCATCATGA GGCGGCGGAA GAGTTGGGGT 

4 01 TGTTGGCGGC AGAAGTCGCA CAGATGGAAA AGAAATACCG GCTGCNNATT 

451 AAGAACAACC TGTATATCAA CAATAACGCT TGGGGCGTTT GCGATCCTTC 

501 CGGTCAGAGC GAATTGACAT TTTTCCCATT GTATAT CGGT TCAACCAAAT 

551 TTATTTTGGT TATCGGCGGC ATTCCCGATT TGGGCAAAGA GGCATTTGTT 

25 601 ACTTTGGTAA GGATNTTATA CCNCCNGTTA CAGCAACCGC GTGTAAAACT 

651 TGGGAGAGAG GANGGGTTAT GCAGCAATTA TTGA 

This encodes a protein having amino acid sequence <SEQ ID 614>: 

1 MESTXSLQAN LYXRLT PAGA FYAVSSDXPS AGKTLLHSLL KADADEMVS S 

51 EKLLTWAXTA DIDTALNLLY RLQKLEFLYG DENGHSDGIN LSDEQLPLLM 

30 101 EQLSGSGKAL LVDRNGLYLA NANFHHEAAE ELGLLAAEVA QMEKKYRLXI 

151 KNNLYINNNA WGVCDPSGQS ELT FFPLYIG STKFILVIGG IPDLGKEAFV 

201 TLVRXLYXXL QQPRVKLGRE XGLCSNY* 

ORF143a and ORF143-1 show 97.1% identity in 207 aa overlap: 

or f 143a pep MESTXSLQANLYXRLTPAGAFYAVSSDXPSAGKTLLHSLLKADADEMVSSEKLLTWAXTA 
35 I I I I I I I I I II I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I II I II I I I II 

orf 143-1 MESTLSLQANLYPRLTPAGAFYAVSSDAPSAGKTLLHSLLKADADEMVSSEKLLTWADTA 

or f 143a. pep DIDTALNLLYRLQKLE FLYGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLA 

1 M I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I M I M I I I M I I I 
40 orf 143-1 DIDTALNLLYRLQKLE FLYGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLA 

orf 143a. pep NANFHHEAAEELGLLAAEVAQMEKKYRLXIKNNLYINNNAWGVCDPSGQSELTFFPLYIG 
I ! I I I I I II I I I I II I I 1 I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I 
orf 143-1 NANFHHEAAEELGLLAAEVAQMEKKYRLLIKNNLYINNNAWGVCDPSGQSELTFFPLYIG 

45 

orf 143a . pep STKFILVIGGIPDLGKEAFVTLVRXLY 
I I I I I I M I I I I I I I II I I I I I I I II 
orf 143-1 STKFILVIGGIPDLGKEAFVTLVRILY 

50 Homology with a predicted ORF from N. gonorrhoeae 

ORF143 shows 95.5% identity over a HOaa overlap with a predicted ORF (ORF143ng) from 
N. gonorrhoeae: 

orf 143 .pep MRTKWSAVRSCTWADTADIDTALNLLYRLQKLEFLYGDENGHSDGINLXDEQLPLLMEQL 60 
I I I I I I I I I I I : I I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 
55 orfl43ng MRTKWSAVRSCSRADTADIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQL 60 

orf 143. pep S G S GKALLVDRNGL YLANAN FHHEAAEELGLLAAEVAQMEKKYRL L I KNN 110 

I I I I I I I I I I I I I I i M II I I I I I : I I II I I I 1 i II I I I II I I M II : I I 
orfl43ng SGSGKALLVDRNGLYLANANFHHE SAEELGLLAAEVAQMEKKYRLLIRNNLYINNNAWGV 120 
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An ORF143ng nucleotide sequence <SEQ JD615> was predicted to encode a protein having amino 
acid sequence <SEQ ED 61 6>: 

1 MRTKWSAVRS CSRADTADID TALNLLYRLQ KLEFLYGDEN GHSDGINLSD 

51 EQLPLLMEQL SGSGKALLVD RNGLYLANAN FHHESAEELG LLAAEVAQME 

5 101 KKYRLLIRNN LYINNNAWGV CDPSGQSELT F FPLYIGSTK FILVIAGI PD 

151 LSKGGICYFG KDFIPPLQQP RVKLGTGGIM RQLLISILED LNNTSTDIIA 

201 SAVISTDGLP MATMLPSHLN SDRVGAISAT LLALGSRSVQ ELACGELEQV 

251 MIKGKSGYIL LSQAGKDAVL VLVAKETG RL GLILLDAKRA ARHIA EAI* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 617>: 

10 1 ATGGAATCAA CACTTTCACT ACAAGCGAAT TTATATCCCT GCCTGACTCC 

51 TGCCGGTGCA TTTTATGCCG TATCCAGCGA TGCCCCCAGT GCCGGTAAAA 

101 CTTTGTTGCG CAGCCTGTTG AAAGCGGATG CGGACGAAGT GGTCAGCAGT 

151 GAGAAGCTGC TCGCGGCGGA CACCGCCGAC ATCGATACCG CTTTGAACCT 

201 GTTGTACCGT TTGCAAAAAC TCGAATTCCT CTATGGCGAT GAAAACGGTC 

15 251 ATTCAGACGG CATCAATTTG TCGGACGAGC AATTGCCGTT GCTGATGGAA 

301 CAATTGTCCG GCAGCGGTAA GGCATTATTG GTCGATCGGA ACGGTCTGTA 

351 TCTTGCCAAC GCCAATTTCC ATCATGAGTC GGCGGAAGAG TTGGGGTTGT 

4 01 TGGCGGCAGA AGTCGCACAG AT GGAAAAGA AATACCGGCT GCTGATTAGG 

451 AACAACCTGT AT AT CAACAA TAACGCTTGG GGCGTTTGCG ATCCTTCCGG 

20 501 TCAGAGCGAA TTGACATTTT TCCCATTGTA TATCGGTTCA ACCAAATTTA 

551 TTTTGGTTAT CGCCGGCATT CCCGATTTGA GCAAAGAGGC ATTTGTTACT 

601 TTGGTAAGGA TTTTATACCG CCGTTACAGC AACCGCGTGT AA 

This corresponds to the amino acid sequence <SEQ ED 618; ORF143ng-l>: 

1 MESTLSLQAN LYPCLTPAGA FYAVSSDAPS AGKTLLRSLL KADADEWSS 

25 51 EKLLAADTAD IDTALNLLYR LQKLEFLYGD ENGHSDGINL SDEQLPLLME 

101 QLSGSGKALL VDRNGLYLAN ANFHHESAEE LGLLAAEVAQ MEKKYRLLIR 

151 NNLYINNNAW GVCDPSGQSE LTF FPLYIGS TKFILVIAGI PDLSKEAFVT 

201 LVRILYRRYS NRV* 

ORF143ng-l and ORP143-1 show 95.8% identity in 214 aa overlap: 

30 orfl43ng-l.pep MESTLSLQANLYPCLTPAGAFYAVSSDAPSAGKTLLRSLLKADADEWSSEKLLA-ADTA 59 

I II II I I II I I II I II II I 1 I II I I I I I I II II I I : ! | ! I M I I I : I I I I I I I : I I I I 

orf 14 3-1 MESTLSLQANLYPRLTPAGAFYAVSSDAPSAGKTLLHSLLKADADEMVSSEKLLTWADTA 60 

orfl43ng-l.pep DIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLA 119 

35 I I I I I I I I I i II I I I I I I I I I I I | | | | | | | [ [ [ I I I I I I I I I I I I I I M II I I I M I I I I 

orf 14 3-1 DIDTALNLLYRLQKLEFLYGDENGHSDGINLSDEQLPLLMEQLSGSGKALLVDRNGLYLA 120 

orf 1 43ng-l . pep NANFHHESAEELGLLAAEVAQMEKKYRLLIRNNLYINNNAWGVCDPSGQSELTFFPLYIG 17 9 

40 orfl43-l NANFHHEAAEELGLLAAEVAQMEKKYRLLIKNNLYi™ 180 

orfl43ng-l.pep STKFILVIAGIPDLSKEAFVTLVRILYRRYSNRV 213 

I I I II I I I = II I I I = I I I I I I II I I I I I II I II i 
orfl43-l STKFILVIGGIPDLGKEAFVTLVRILYRRYSNRV 214 

45 Based on the presence of the putative transmembrane domains in the gonococcal protein, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 74 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 619>: 

1 ATGACCTTTT TACAACGTTT GCAAGGTTTG GCAGACAATA AAATCTGTGC 
51 GTTTGCATGG TTCGTCGTCC GCCGCTTTGA TGAAGAACGC GTACCGCAGr 
101 CGGCGGCAAG CATGACGTTT ACGACGCTGC TGGCACTCGT CCCCGTGCTG 
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151 ACCGTGATGG TGGCGGTCGC TTCGATTTTC CCCGTGTTCG ACCGCTGGTC 

201 GGATTCGTTC GTCTCCTTCG TCAACCAAAC CATTGTGCCG CA.GGCGCGG 

251 ACATGGTGTT CGACTATATC AATGCGTTCC GCGAGCAGGC GAACCGGCTG 

301 ACGGCAATCG GCAGCGTGAT GCTGGTCGTT ACCTCGCTGA TGCTGATTCG 

351 GACGATAGAC AATACGTTCA ACCGCATCTG GaCGGGTCAA wTyCCAGCGT 

401 CCGTGGATG. . 

This corresponds to the amino acid sequence <SEQ ID 620; ORF144>: 



1 MTFLQRLQGL ADNKICAFAW FWRRFDEER VPQXAASMTF TTLLALVPVL 
51 TVMVAVASIF PVFDRWSDSF VSFVNQTIVP XGADMV FDY I NAFREQANRL 
101 TAIGSVMLW TSLMLIRTID NTFNRIWRVX XQRPWM. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 62 1>: 



1 ATGACCTTTT TACAACGTTT GCAAGGTTTG GCAGACAATA AAATCTGTGC 

51 GTTTGCATGG TTCGTCGTCC GCCGCTTTGA TGAAGAACGC GTACCGCAGG 

101 CGGCGGCAAG CATGACGTTT ACGACGCTGC TGGCACTCGT CCCCGTGCTG 

151 ACCGTGATGG TGGCGGTCGC TTCGATTTTC CCCGTGTTCG ACCGCTGGTC 

201 GGATTCGTTC GTCTCCTTCG TCAACCAAAC CATTGTGCCG CAGGGCGCGG 

251 ACATGGTGTT CGACTATATC AATGCGTTCC GCGAGCAGGC GAACCGGCTG 

301 ACGGCAATCG GCAGCGTGAT GCTGGTCGTT ACCTCGCTGA TGCTGATTCG 

351 GACGATAGAC AATACGTTCA ACCGCATCTG GCGGGTCAAT TCCCAGCGTC 

4 01 CGTGGATGAT GCAGTTTCTC GTCTATTGGG CTTTACTGAC GTTCGGGCCG 

4 51 CTGTCTTTGG GCGTGGGCAT TTCCTTTATG GTCGGCTCGG TACAGGATGC 

501 CGCGCTTGCC TCAGGTGCGC CGCAGTGGTC GGGCGCGTTG CGAACGGCGG 

551 CGACGCTGAC CTTCATGACG CTTTTGCTGT GGGGGCTGTA CCGCTTCGTG 

601 CCAAACCGCT TCGTTCCCGC GCGGCAGGCG TTTGTCGGGG CTTTGGCAAC 

651 AGCGTTTTGT CTGGAAACCG CGCGCTCCCT CTTCACTTGG TATATGGGCA 

7 01 ATTTCGACGG CTACCGCTCG ATTTACGGCG CGTTTGCCGC CGTGCCGTTT 

751 TTTCTGTTGT GGCTGAACCT GTTGTGGACG CTGGTCTTGG GCGGCGCGGT 

801 GCTGACTTCT TCACTCTCCT ACTGGCAGGG AGAAGCGTTC CGCAGGGGCT 

851 TCGACTCGCG CGGACGGTTT GACGACGTGT TGAAAATCCT GCTGCTTCTG 

901 GATGCGGCGC AAAAAGAAGG CAAAGCCTTG CCTGTT CAGG AGTTCAGACG 

951 GCATATCAAT ATGGGCTACG ACGAGTTGGG CGAGCTTTTG GAAAAGCTGG 

1001 CGCGGCACGG CTACATCTAT TCCGGCAGAC AGGGTTGGGT GTTGAAAACG 

1051 GGGGCGGATT CGATTGAGTT GAACGAACTC TTCAAGCTCT TCGTTTACCG 

1101 TCCGTTGCCT GTGGAAAGGG ATCATGTGAA CCAAGCTGTC GATGCGGTAA 

1151 TGACACCGTG TTTGCAGACT TTGAACATGA CGCTGGCAGA GTTTGACGCT 

1201 CAGGCGAAAA AACGGCAGTA G 

This corresponds to the amino acid sequence <SEQ ID 622; ORF144-l>: 

1 MTFLQRLQGL ADNKICAFA W FWRRFDEER VPQAAASMTF TT LLALVPVL 

51 TVMVAVASI F PVFDRWSDSF VSFVNQTIVP QGADMVFDYI NAFREQANRL 

101 TAIGSVMLW TSLMLI RTID NT FNRIWRVN SQRPWMMQFL VYW ALLTFGP 

151 LSLGVGISFM V GSVQDAALA SGAPQWSGAL RTAATLTFMT 1LLWGLYRFV 

201 PNRFVPARQA FVGALATAFC LETARSLFTW YMGNFDGYRS IYGAFAAVPF 

251 FLLWLNLLWT LVL GGAVLTS SLSYWQGEAF RRGFDSRGRF DDVLKILLLL 

3 01 DAAQKEGKAL PVQEFRRHIN MGYDELGELL EKLARHGYIY SGRQGWVLKT 

3 51 GADSIELNEL FKLFVYRPLP VERDHVNQAV DAVMTPCLQT LNMTLAEFDA 

4 01 QAKKRQ* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from JV. meningitidis (strain A) 

ORF144 shows 96.3% identity over a 136aa overlap with an ORF (ORFT44a) from strain A of N. 
meningitidis: 



10 20 30 40 50 60 

orf 1 4 4 . pep MTFLQRLQGLADNKICAFAW FWRRFDEERVPQXAASMTFTT LLALVPVLTVMVAVASI F 
M II M M I M I M 1 I II II I I I I II I I I I I If I 11 II f I I I M I I I I II I I I I II I I I 
orfl44a MTFLQRLQGLADNKICAFA WFWRRFDEERVPQAAASMTFTT LLALVPVLTVMVAVASI F 

10 20 30 40 50 60 

70 80 90 100 110 120 
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PVFDRWSDSFVSFVNQTIVPXGADMVFDYINAFREQAN RLTAIGSVMLVVTSLML IRTID 
I I I I I I I I ! I I I I I I I I I I I 1 II I 1 1 I I I I I I I I I I 1 I I I I I I I I I I I I I I II M II I 
PVFDRWSDSFVSFVNQTIVPQGADMVFDYINAFREQAN RLTAIGSVMLWTSXML IRTID 
70 80 90 100 110 120 

130 

NT FNRI WRVXXQRPWM 
I I M I I I II Mill 

NT FNRI WRVN S QRPWMMQFLVYW ALLTFGPLSLGVGISFXV G SVQDAALAS GAPQWS GAL 
130 140 150 160 170 180 

The complete length ORF144a nucleotide sequence <SEQ ID 623> is: 

1 ATGACCTTTT TACAACGTTT GCAAGGTTTG GCAGACAATA AAATCTGTGC 

51 GTTTGCATGG TTCGTCGTCC GCCGCTTTGA TGAAGAACGC GTACCGCAGG 

101 CGGCGGCAAG CATGACGTTT ACGACACTGC TGGCACTCGT CCCCGTGCTG 

151 ACCGTGATGG TGGCGGTCGC TTCGATTTTC CCCGTGTTCG ACCGNTGGTC 

201 GGATTCGTTC GTCTCCTTCG TCAACCAAAC CATTGTGCCG CAGGGCGCGG 

251 ACATGGTNTT CGACTATATC AATGCGTTCC GCGAGCAGGC GAACCGGCTG 

301 ACGGCAATCG GCAGCGTGAT GCTGGTCGTT ACCTCGCNGA TGCTGATTCG 

351 GACGATAGAC AATACGTTCA ACCGCATCTG GCGGGT CAAT TCCCAGCGTC 

401 CGTGGATGAT GCAGTTTCTC GTCTATTGGG CTTTACTGAC GTTCGGGCCG 

451 CTGTCTTTGG GCGTGGGCAT TTCCTTTATN GTCGGCTCGG TACAGGATGC 

501 CGCGCTTGCC TCAGGTGCGC CGCAGTGGTC GGGCGCGTTG CGAACGGCGG 

551 CGACGCTGAN CTTCATGACG CTTTTGCTGT GGGGGCTGTA CCGCTNCGTG 

601 CCAAACCGCT TCGTTCCCGC GCGGCANGCG TTTGTCGGGG CTTTGGCAAC 

651 AGCGTTCTGT CTGGAAACCG CGCGTTCCCT CTTTACTTGG TATATGGGCA 

7 01 ATTTCGACGG CTACCGCTCG ATTTACGGNG CGTTTGCCGC CGTGCCGTTT 

7 51 TTTCTGTTGT GGCTGAACCT GTTGTGGACG CTGGTCTTGG GCGGCGCGGT 

801 GCTGACTTCT TCACTCTCCT ACTGGCAGGG AGAAGCGTTC CGCAGGGNCT 

851 TCGACTCGCG CGGACGGTTT GACGACGTGT TGAAAATCCT GCTGCTTCTG 

901 GATGCGGCGC AAAAAGAAGG CNAAGCCTTG CCTGTTCAGG AGTTCAGACG 

951 GCATATCAAT ATGGGCTACG ACGAGTTGGG CGAGCTTTTG GAAAAGCTGG 

1001 CGCGGCACGG CTACATCTAT TCCGGCAGAC AGGGTTGGGT GTTGAAAACG 

1051 GGGGCGGATT CGATTGAGTT GAACGAACTC TTCAAGCTCT TCGTTTACCG 

1101 TCCGTTGCCT GTGGAAAGGG ATCATGTGAA CCAAGCTGTC GATGCGGTAA 

1151 TGATGCCGTG TTTGCAGACT TTGAACATGA CGCTGGCAGA GTTTGACGCT 

12 01 CAGGCGAAAA AACAG C AG C A ATCTTGA 

This encodes a protein having amino acid sequence <SEQ ID 624>: 

1 MTFLQRLQGL ADNKICAFA W FWRRFDEER VPQAAASMTF TT LLALVPVL 

51 TVMVAVASI F PVFDRWSDSF VSFVNQTIVP QGADMVFDYI NAFREQANRL 

101 TAIGSVMLVV TSXMLI RTID NTFNRIWRVN SQRPWMMQFL VYW ALLTFGP 

151 LSLGVGI5FX V GSVQDAALA SGAPQWSGAL RTAATLXFMT LLLWGLYRXV 

201 PNRFVPARXA FVGALATAFC LETARSLFTW YMGNFDGYRS IYGAFA AVPF 

251 FLLWLNLLWT LVL GGAVLTS SLSYWQGEAF RRXFDSRGRF DDVLKILLLL 

301 DAAQKEGXAL PVQEFRRHIN MGYDELGELL EKLARHGY I Y SGRQGWVLKT 

351 GADSIELNEL FKLFVYRPLP VERDHVNQAV DAVMMPCLQT LNMTLAEFDA 

401 QAKKQQQS* 

ORF144a and ORF144-1 show 97.8% identity in 406 aa overlap: 

orf 14 4a . pep MT FLQRLQGLADNKI CAFAW FWRRFDEERVPQAAASMT FTT LLALV PVLTVMVAVAS I F 
I M I I M I I I I I I M II I I I I 1 I I I I I II 1 I I I I 1 I I I | | || || || | | 1 | | M I M M I 1 
orf 144-1 MTFLQRLQGLADNKICAFAWFWRRFDEERVPQAAASMTFTTLLALVPVLTVMVAVASIF 

orf 14 4a. pep PVFDRWSDSFVSFVNQTIVPQGADiyrv'FDYINAFREQANRLTAIGSVMLWTSXMLIRTID 
I I I I I I I I M II I I I I I I I I I I I I I I I I I I I | | I I | | | | | | | | | | | | | | | || | | | | | | | 
orfl44-l PVFDRW S D S FVS FVNQT IV PQGADMVFDY INAFREQANRLTAI GS VMLWT S LMLIRT I D 

orf 14 4a. pep NTFNRIWRVNSQRPWMMQFLVYWALLT FGPLSLGVGISFXVGSVQDAALASGAPQWSGAL 
N M I II I M I I J I I I I I I | I I | | || | | | | | | | | | j M | 1 I I I | | | | | | | I I I I I I I I I 
orf 144-1 NTFNRIWRVNSQRPWMMQFLVYWALLTFGPLSLGVGISFMVGSVQDAALASGAPQWSGAL 

orf 144a . pep RTAATLXFMTLLLWGLYRXVPNRFVPARXAFVGALATAFCLETARSLFTWYMGNFDGYRS 
, I I I I I I : I I I I I I I I I I I I II I i I I I I I I M M I I I i I i I N II I I I II II I I I I | | | 

orf 144-1 RTAATLTFMTLLLWGLYRFVPNRFVPARQAFVGALATAFCLETARSLFTWYMGNFDGYRS 



orfl44 .pep 
orfl44a 

orf 144 . pep 
orfl44a 
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orf 144a . pep IYGAFAAVPFFLLWLNLLWTLVLGGAVLTSSLSYWQGEAFRRXFDSRGRFDDVLKILLLL 
I I I I I I I 1 I I I I I I I I I I 1 I I I I 1 M I 1 1 I I I 1 I 1 I I I I I 1 1 I I I I I I I I I I 1 I I I I I I 
orf 144-1 IYGAFAAVPFFLLWLNLLWTLVLGGAVLTSSLSYWQGEAFRRGFDSRGRFDDVLKILLLL 

5 orf 144a . pep DAAQKEGXALPVQE FRRHINMGYDELGELLEKLARHGYI YSGRQGWVLKTGADSIELNEL 

MUNI I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I M I I M I I I M I i I I I I I I I 
orf 14 4-1 DAAQKEGKALPVQEFRRHINMGYDELGELLEKLARHGYI YSGRQGWVLKTGADSIELNEL 

orf 144a. pep FKLFVYRPLPVERDHVNQAVDAVMMPCLQTLNMTLAEFDAQAKKQQQS 408 
10 I I [ I I I I II I M M II I II M ( I I ! I M M II I I II I M I I! I : I 

orfl44-l FKLFVYRPL PVERDHVNQAVDAVMT PCLQTLNMTLAE FDAQAKKRQ 406 

Homology with a predicted ORF from N. gonorrhoeae 

ORF144 shows 91.2% identity over a 136aa overlap with a predicted ORF (ORF144ng) from 
15 N. gonorrhoeae: 

orf 144 .pep MTFLQRLQGLADNKICAFAWFWRRFDEERVPQXAASMTFTTLLALVPVLTVMVAVASIF 60 

I I I I I II I I I II I I 1 I I I I : I M : II I I I I I I I I I I I I I I I I I I M I I I II I II I I 
orfl4 4ng MTFLQCWQGSADNKICAFAWFVIRRFSEERVPQAAASMTFTTLLALVPVLTVMVAVASIF 60 

20 orf 144. pep PVFDRWS DS FVS FVNQT I VPXGADMVFDYINAFREQANRLT AI GS VMLWT S LML 1 RT I D 120 

1 I I I M I I I I I I 1 I I I I I I I I I I I I I II I : I I I : I I I I I I I I I I I I I I I I I I I I I I I I | 
or f 14 4ng PVFDRWSDSFVSFVNQTIVPQGADMVFDYIDAFRDQANRLTAIGSVMLWTSLMLIRTID 12 0 

orfl44.pep NTFNRIWRVXXQRPWM 136 
25 1:1111:'! 

orfl4 4ng NAFNRIWRWTQRPWMMQFLVYWALLTFGPLSLGVGISFMVGSVQDSVLSSGAQQWADAL 18 0 

The complete length ORF144ng nucleotide sequence <SEQ ID 625> is predicted to encode a 
protein having amino acid sequence <SEQ ED 626>: 

1 MTFLQCWQGS ADNKICAFAW FVIRRFSEER VPQAAASMTF TTLLALVPVL 

30 51 TVMVAVAS I F PVFDRWSDSF VSFVNQTTVP QGADMVFDYI DAFRDQANRL 

101 TAIGSVMLW TSLMLI RTID NAFNRIWRVN TQRPWNMQFL VYWALLTFGP 

151 LSLGVGISFM V GSVQDSVLS SGAQQWADAL KTAARLAFMT LLLWGLYRFV 

201 PNRFVPARQA FVGALITAFC LETARFLFTW YMGNFDGYRS IYGAFA AVPF 

251 FLLWLNLLWT LVL GGAVLTS SLSYWQGEAF RRGFDSRGRF DDVLKILLLL 

35 301 DAAQKEGRTL SVQEFRRHIN MGYDELGELL EKLARYGYIY SGRQGWVLKT 

351 GADSIELSEL FKXFVYRPLP VERDHVNQAV DAVMTPCLQT LNMTLAE FDA 

4 01 QAKKQQQS* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 627>: 

1 ATGACCTTTT TACAACGTTG GCAAGGTTTG GCGGACAATA AAATCTGTGC 
40 51 ATTTGCATGG TTCGTCATCC GCCGTTTCAG TGAAGAGCGC GTACCGCAGG 

101 CAGCGGCGAG CATGACGTTT ACGACACTGC TGGCACTCGT CCCCGTACTG 
1 n 1 ACCGTAATGG TCGCGGTCGC TTCGATTTTC CCCGTGTTCG ACCGCTGGTC 
GGATTCGTTC GTCTCCTTCG TCAACCAAAC CATTGTGCCG CAGGGCGCGG 
ATATGGTGTT CGACTATATC GACGCATTCC GCGATCAGGC AAACCGGCTG 
45 301 ACCGCCATCG GCAGCGTGAT GCTGGTCGTA ACCTCGCTGA TGCTGATTCG 

351 GACGATAGAC AATGCGTTCA ACCGCATCTG GCGGGTTAAC ACGCAACGCC 
4 01 CCTGGATGAT GCAGTTCCTC GTTTATTGGG CGTTGCTGAC TTTCGGGCCT 
451 TTGTCTTTGG GTGTGGGCAT TTCCTTTATG GTCGGGTCGG TTCAAGACTC 
501 CGTACTCTCC TCCGGAGCGC AACAATGGGC GGACGCGTTG AAGACGGCGG 
551 CAAGGCTGGC TTTCATGACG CTTTTGCTGT GGGGGCTGTA CCGCTTCGTG 
601 CCCAACCGCT TCGTGCCCGC CCGGCAGGCG TTTGTCGGAG CTTTGATTAC 
GGCATTCTGC CTGGAGACGG CACGTTTCCT GTTCACCTGG TATATGGGCA 
ATTTCGACGG CTACCGCTCG ATTTACGGCG CATTTGCCGC CGTGCCGTTT 
TTCCTGCTGT GGTTAAACCT GCTGTGGACG CTGGTCTTGG GCGGGGCGGT 
GCTGACTTCG TCGCTGTCTT ATTGGCAGGG CGAGGCCTTC CGCAGGGGAT 
TCGACTCGCG CGGACGGTTT GACGACGTGT TGAAAATCCT GCTGCTTCTG 
GATGCGGCGC AAAAAGAAGG CCGAACCCTG TCCGTTCAGG AGTTCAGACG 

GCATATCAAT ATGGGTTACG ATGAATTGGG CGAGCTTTTG GAAAAGCTGG 

_- n 3-001 CGCGGTACGG CTATATCTAT TCCGGCAGAC AGGGCTGGGT TTTGAAAACG 

0U 1051 GGGGCGGATT CGATTGAGTT GAGCGAACTC TTCAAGCTCT TCGTGTACCG 



151 
201 
251 



651 
701 
751 
801 
851 
901 
951 
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1101 CCCGTTGCct gtggaAAGGG ATCATGTGAA CCAAGCTGtc gaTGCGGTAA 
1151 TGAcgccgtG TTTGCAGACT TTGAACATGA CGCTGGCGGA GTTTGACGCT 
1201 CAGgcgAAAA AACAGCAGCA GTCTTGA 

This encodes a variant of ORF144ng, having the amino acid sequence <SEQ ID 628; ORF144ng- 



MTFLQRWQGI, ADNKICAFA W FVIRRFSEER VPQAAASMTF TT LLALVPVL 
TVMVAVASI F PVFDRWSDSF VSFVNQTIVP QGADMVFDYI DAFRDQANRL 
TAIGSVMLVV TSLMLI RTID NAFNRIWRVN TQRPWMMQFL VYWALLTFGP 
LSLGVGISFM V GSVQDSVLS SGAQQWADAL KTAARLAFMT LLLWGLYRFV 
PNRFVPARQA FV GAL IT AFC LETARFLFTW YMGNFDGYRS IYGAFA AVPF 
FLLWLNLLWT LVL GGAVLTS SLSYWQGEAF RRGFDSRGRF DDVLKILLLL 
OUJ . DAAQKEGRTL SVQEFRRHIN MGYDELGELL EKLARYGYIY SGRQGWVLKT 
351 GADSIELSEL FKLFVYRPLP VERDHVNQAV DAVMTPCLQT LNMTLAEFDA 
401 QAKKQQQS* 



101 
151 
201 
251 
301 



15 ORF144ng-l and ORF144-1 show 94.1% identity in 406 aa overlap: 

MTFLQRWQGLADNKICAFAWFVIRRFSEERVPQAAASMTFTTLLALVPVLTVMVAVASIF 

MINI I I I I I I I I ! I : I M : M I I I I I I I I I I M I I M M M I M II I I I M I 

MTFLQRLQGLADNKICAFAWFWRRFDEERVPQAAASMTFTTLLALVPVLTVMVAVASIF 

PV FDRW SDS FVS FVNQT I VPOGADMVFDY I DAFRDQANRLT AI GS VML WT S LML I RT I D 
I I I I I I I I M II I I I I M I I I I II I I I I I I : I I I : I I I I I I I I I I I I I I I I I I I I I I I I I 

PVFDRWSDSFVS FVNQT I VPQGADMVFDY IN AFREQANRLT AI GS VMLVVT S LML I RT I D 
irf 144ng-l . pep NAFNRIWRVNTQRPWNMQFLVYWALLTFGPLSLGVGISFMVGSVQDSVLS SGAQQWADAL 
NTFNRIWRVNSQRPWMMQFLVYWALLTFGPLSLGVGISFMVGSVQDAALASGAPQWSGAL 
LAFMTLLLWGLYRFVPNRFVPARQAFVGALITAFCLETARFLFTWYMGNFDGYRS 

I : II I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I M II I I M 
30 orf 144-1 RTAATLTFMTLLLWGLYRFVPNRFVPARQAFVGALATAFCLETARSLFTWYMGNFDGYRS 



I I ! M I [ : : I I I II I I I I I I I I I I I I I I I I I I I I : II II II I I I I I I I I I I I I I I I : I I 
DAAQKEGKALPVQEFRRHINMGYDELGELLEKLARHGYIYSGRQGWVLKTGADSIELNEL 



I || I I I I I I I I I I I I I I I I I I 1 M I I I I I II I I I I I I I I I I I I I : I 
FKLFVYRPLPVERDHVNQAVDAVMTPCLQTLNMTLAEFDAQAKKRQ 



domains in the gonococcal protein, it is predicted that the proteins from K meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



orfl44ng-l. 


,pep 


orfl44-l 




orf 144ng-l . 


■ pep 


orfl44-l 




orfl44ng-l. 


■ pep 


orfl44-l 




orfl44ng-l 


.pep 


orfl44-l 




orfl44ng-l 


.pep 


orfl44-l 




orfl44ng-l 


• pep 


orfl44-l 




orfl44ng-l 


.pep 


orfl44-l 




On this basis of this 



Example 75 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 629>: 

1 . .AGACACGCCC GCCGCATCCG CATCGACACC GCCATCAACC CCGAACTGGA 

51 AGCCCTCGCC GAACACCTCC ACTACCAATG GCAGGGCTTC CTCTGGCTCA 

101 GCAC CGAT AT GCGTCAGGAA ATTTCCGCCC TCGTCATCCT GCTGCAACGC 

151 ACCCGCCGCA AATGGCTGGA TGCCCACGAA CGCCAACACC TGCGCCAAAG 

2 01 CCTGCTTGAA ACACGGGAAC ACGGCTGA 
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This corresponds to the amino acid sequence <SEQ ID 630; ORF146>: 

1 . . RHARRIRIDT AINPELEALA EHLHYQWQGF LWLSTDMRQE ISALVILLQR 
51 TRRKWLDAHE RQHLRQSLLE TREHG* 

Further work revealed the complete nucleotide sequence <SEQ ID 63 1>: 

1 ATGAACACCT CGCAACGCAA CCGCCTCGTC AGCCGCTGGC TCAACTCCTA 

51 CGAACGCTAC CGCTACCGCC GCCTCATCCA CGCCGTCCGG CTCGGCGGGG 

101 CCGTCCTGTT CGCCACCGCC TCCGCCCGGC TGCTCCACCT CCAACACGGC 

151 GAGTGGATAG GGATGACCGT CTTCGTCGTC CTCGGCATGC TCCAGTTTCA 

201 AGGGGCGATT TACTCCAAGG CGGTGGAACG TATGCTCGGC ACGGTCATCG 

251 GGCTGGGCGC GGGTTTGGGC GTTTTATGGC TGAACCAGCA TTATTTCCAC 

301 GGCAACCTCC TCTTCTACCT CACCGTCGGC ACGGCAAGCG CACTGGCCGG 

351 CTGGGCGGCG GTCGGCAAAA ACGGCTACGT CCCTATGCTG GCAGGGCTGA 

4 01 CGATGTGTAT GCTCATCGGC GACAACGGCA GCGAATGGCT CGACAGCGGA 

451 CTCATGCGCG CCATGAACGT CCTCATCGGC GCGGCCATCG CCATCGCCGC 

501 CGCCAAACTG CTGCCGCTGA AATCCACACT GATGTGGCGT TTCATGCTTG 

551 CCGACAACCT GGCCGACTGC AGCAAAATGA TTGCCGAAAT CAGCAACGGC 

601 AGGCGCATGA CCCGCGAACG CCTCGAGGAG AACATGGCGA AAATGCGCCA 

651 AATCAACGCA CGCATGGTCA AAAGCCGCAG CCATCTCGCC GCCACATCGG 

701 GCGAAAGCCG CATCAGCCCC GCCATGATGG AAGCCATGCA GCACGCCCAC 

7 51 CGTAAAATCG TCAACACCAC CGAGCTGCTC CTGACCACCG CCGCCAAGCT 

8 01 GCAATCTCCC AAACT CAACG GCAGCGAAAT CCGGCTGCTT GACCGCCACT 
851 TCACACTGCT CCAAACCGAC CTGCAACAAA CCGTCGCCCT TATCAACGGC 
901 AGACACGCCC GCCGCATCCG CATCGACACC GCCATCAACC CCGAACTGGA 
951 AGCCCTCGCC GAACACCTCC ACTACCAATG GCAGGGCTTC CTCTGGCTCA 

1001 GCACCAATAT GCGTCAGGAA ATTTCCGCCC TCGTCATCCT GCTGCAACGC 

1051 ACCCGCCGCA AATGGCTGGA TGCCCACGAA CGCCAACACC TGCGCCAAAG 

1101 CCTGCTTGAA ACACGGGAAC ACGGCTGA 

This corresponds to the amino acid sequence <SEQ ID 632; ORF146-l>: 

1 MNTSQRNRLV SRWLNSYERY RYRRL IHAVR LGGAVLFATA SARLLHLQHG 

51 EW IGMTVFVV LGMLQFQGA I YSKAVER MLG TVIGLGAGLG VLWLN QHYFH 

101 GNLLFYLTVG TASALAGWAA VGKNGYVPML AGLTMCMLIG DNGSEWLDSG 

151 LMRAMN VLIG AAIAIAAAKL LPL KSTLMWR FMLADNLADC SKMIAEISNG 

201 RRMTRERLEE NMAKMRQ IN A RMVKSRSHLA ATSGESRISP AMMEAMQHAH 

251 RKIVNTTELL LTTAAKLQSP KLNGSEIRLL DRHFTLLQTD LQQTVALING 

301 i?HARRIRIDT AINPELEALA EHLHYQWQGF LWLSTNMRQE ISALVILLQR 

351 TRRKWLDAHE RQHLRQSLLE TREHG* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF146 shows 98.6% identity over a 74aa overlap with an ORF (ORF146a) from strain A of N. 
meningitidis: 



10 20 30 

orfl4 6.pep RHARRIRIDTAINPELEALAEHLHYQWQGF 

I I I I ! M I I I I I I I [ I I I I I I I I I I I I I [| 
orfl4 6a KLNGSEIRLLDRHFTLLQTDLQQTVALINGRHARRIRIDTAINPELEALAEHLHYQWQGF 
280 290 300 310 320 330 

40 50 60 70 

orf 14 6 . pep LWLSTDMRQE ISALVILLQRTRRKWLDAHERQHLRQSLLETREHGX 

I M I I : I I I [ I I I I I I I I I I I I I I I I I | | | | | | | | | | | | | ] | | | : 
orf 14 6a LWLSTNMRQEISALVILLQRTRRKWLDAHERQHLRQSLLETREHSX 
340 350 360 370 

The complete length ORF146a nucleotide sequence <SEQ ID 633> is: 

1 ATGAACACCT CGCAACGCAA CCGCCTCGTC AGCCGCTGGC TCAACTCCTA 

51 CGAACGCTAC CGCTACCGCC GCCTCATCCA CGCCGTCCGG CTCGGCGGGG 

101 CCGTCCTGTT CGCCACCGCC TCCGCCCGGC TGCTCCACCT CCAACACGGC 

151 GAGTGGATAG GGATGACCGT CTTCGTCGTC CTCGGCATGC TCCAGTTTCA 
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201 AGGGGCGATT TACTCCAAGG CGGTGGAACG TATGCTCGGC ACGGTCATCG 

251 GGCTGGGCGC GGGTTTGGGC GTTTTATGGC TGAACCAGCA TTATTTCCAC 

301 GGCAACCTCC TCTTCTACCT CACCGTCGGC ACGGCAAGCG CACTGGCCGG 

351 CTGGGCGGCG GTCGGCAAAA ACGGCTACGT CCCTATGCTG GCGGGGCTGA 

401 CGATGTGCAT GCTCATCGGC GACAACGGCA GCGAATGGTT CGACAGCGGC 

451 CTGATGCGCG CGATGAACGT CCTCATCGGC GCGGCCATCG CCATCGCCGC 

501 CGCCAAACTG CTGCCGCTGA AATCCACACT GATGTGGCGT TTCATGCTTG 

551 CCGACAACCT GACCGACTGC AGCAAAATGA TTGCCGAAAT CAGCAACGGC 

601 AGGCGCATGA CCCGCGAACG CCTCGAAGAG AACATGGCGA AAATGCGCCA 

651 AATCAACGCA CGCATGGTCA AAAGCCGCAG CCACCTCGCC GCCACATCGG 

7 01 GCGAAAGCCG CATCAGCCCC GCCATGATGG AAGCCATGCA GCACGCCCAC 

751 CGTAAAATTG TCAACACCAC CGAGCTGCTC CTGACCACCG CCGCCAAGCT 

801 GCAATCTCCC AAACTCAACG GCAGCGAAAT CCGGCTGCTT GACCGCCACT 

851 TCACACTGCT CCAAACCGAC CTGCAACAAA CCGTCGCCCT TATCAACGGC 

901 AGACACGCCC GCCGCATCCG CATCGACACC GCCATCAACC CCGAACTGGA 

951 AGCCCTCGCC GAACACCTCC ACTACCAATG GCAGGGCTTC CTCTGGCTCA 

1001 GCACCAATAT GCGTCAGGAA ATTTCCGCCC TCGTCATCCT GCTGCAACGC 

1051 ACCCGCCGCA AATGGCTGGA TGCCCACGAA CGCCAACACC TGCGCCAAAG 

1101 CCTGCTTGAA ACACGGGAAC ACAGTTGA 

This encodes a protein having amino acid sequence <SEQ ID 634>: 



1 MNTSQRNRLV SRWLNSYERY RYRRL I HAVR LGGAVLFATA SARLLHLQHG 

51 EW IGMTVFW LGMLQFQGA I YSKAVER MLG TVIGLGAGLG VLWL NQHYFH 

101 GNLLFYLTVG TASALAGWAA VGKNGYVPML AGLTMCMLIG DNGSEWFDSG 

151 LMRAMN VLIG AAIAIAAAKL LPL KSTLMWR FMLADNLTDC SKMIAEISNG 

201 RRMTRERLEE NMAKMRQINA RMVKSRSHLA ATSGESRISP AMMEAMQHAH 

251 RKIVNTTELL LTTAAKLQSP KLNGSEIRLL DRHFTLLQTD LQQTVALING 

301 RHARRIRIDT AINPELEALA EHLHYQWQGF LWLSTNMRQE ISALVILLQR 

351 TRRKWLDAHE RQHLRQSLLE TREHS* 

ORF146a and ORF146-1 show 99.5% identity in 374 aa overlap: 



orfl4 6a.pep MNTSQRNRLVSRWLNSYERYRYRRLIHAVRLGGAVLFATASARLLHLQHGEWIGMTVFW 

or f 1 4 6-1 lyWTSQRNRLVSRWLNSYERYRYRRLIHAVRLGGAVLFATASARLLHLQHGEWIGMTVFW 

orf 14 6a . pep LGMLQFQGAI YSKAVERMLGTVIGLGAGLGVLWLNQHYFHGNLLFYLTVGTASALAGWAA 
I I 1 I I I I I M I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I | I I | | | | | M I I I 
orf 14 6-1 LGMLQFQGAIYSKAVERMLGTVIGLGAGLGVLWLNQHYFHGNLLFYLTVGTASALAGWAA 

orf 14 6a . pep VGKNGYVPMLAGLTMCMLIGDNGSEWFDSGLMRAMNVLIGAAIAIAAAKLLPLKSTLMWR 

I M M M I I M I I II II : I I I I I I I I I I I I I I I I I || I | I | | | | | | || | M 

orf 14 6-1 VGKNGYVPMLAGLTMCMLIGDNGSEWLDSGLMRAMNVLIGAAIAIAAAKLLPLKSTLMWR 

orf 14 6a . pep FMLADNLTDCSJCMIAEISNGRRMTRERLEENMAKMRQINARMVKSRSHLAATSGESRISP 
I I I I I M : I I I I I I I I I I I I I I I I I I I I | | I | | | | | | | | | | | | ! | | | | | | | | | j | | | | j | 
orf 14 6-1 FMLADNLADCSKMIAEISNGRRMTRERLEENMAKMRQINARMVKSRSHLAATSGESRISP 

orf 14 6a . pep AMMEAMQHAHRKIWTTELLLTTAAKLQS PKLNGSE IRLLDRHFTLLQT DLQQTVAL ING 
I I I I M I 1 I I M I I I I I I I II II I I II I I I I [ i I II I I I I I I M II II li II II M I M I 
orfl4 6-l AMMEAMQHAHRKIVNTTELLLTTAAKLQSPKLNGSEIRLLDRHFTLLQTDLQQTVALING 

orf 1 4 6a .pep RHARRIRIDTAINPELEALAEHLHYQWQGFLWLSTNMRQEISALVILLQRTRRKWLDAHE 
I I I I I M I I I I I 1 I M I I I II I I I I I I I I I I I I I I I II II II I I I II I I II I I II I II I I 
orf 14 6-1 RHARRIRIDTAINPELEALAEHLHYQWQGFLWLSTNMRQEISALVILLQRTRRKWLDAHE 

orf 14 6a. pep RQHLRQSLLETREHSX 

I I I I I I I I I I I I I I : 
orf 14 6-1 RQHLRQSLLETREHGX 



Homology with a predicted ORF from N. gonorrhoeae 

ORF146 shows 97.3% identity over a 75aa overlap with a predicted ORF (ORF146ng) from 
N. gonorrhoeae: 



CHIR-0160 (356.001) 



-372- 



PATENT 



orfl46 



RHARRIRIDTAINPELEALAEHLHYQWQGF 
I I I I I I I I I I I I I I S i J | J J i 

fl46ng 



KLNGSEIRLLDRHFTLLQTDLQQTAALINGRHARRIRIDTAINPELEALAEHLHYQWQGF 364 



orfl46 pep LWLSTDMRQEISALVILLQRTRRKWLDAHERQHLRQSLLETREHG 75 

1 | | | | : | II I I I I I I I | I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl46ng LWLSTNMRQEISALVIPLQRTRRKWLDAHERQHLRQSLLETREHG 409 

An ORF146ng nucleotide sequence <SEQ ID 635> was predicted to encode a protein having amino 
acid sequence <SEQ ID 636>: 

1 MSGVRFPSPA PIPSTDPPSG SLCFFTFPLQ TASDJWSSQR KRLSGRWLNS 

51 YE RYRHRRL I HAVRLGGTVL FATALARLLH LQHGEW IGMT VFWLGMLQF 

101 QGAIYSNAVE R MLGTVIGLG AGLGVLWLN Q HYFHGNLLFY LTIGTASALA 

151 GWAAVGKNGY VPMLAGLTMC MLIGDNGSEW LDSGLMRAMN VLIGAAIAIA 

201 AAKLLPL KST LMWRFMLADN LADCSKMIAE ISNGRRMTRE RLEQNMVKMR 

251 QINARMVKSR SHLAATSGES RISPSMMEAM QHAHRKIVNT TELLLTTAAK 

301 LQSPKLNGSE IRLLDRHFTL LQTDLQQTAA LINGRHARRI RIDTAINPEL 

351 EALAEHLHYQ WQGFLWLSTN MRQEISALVI PLQRTRRKWL DAHERQHLRQ 

401 SLLETREHG* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 637>: 

1 ATGAACTCCT CGCAACGCAA ACGCCTTTCC GgccGCTGGC TCAACTCCTA 

51 CGAACGCTac cGCCaccGCC GCCTCATACA TGCCGTGCGG CTCGGCggaa 

101 ccgtCCTGTT CGCCACCGCA CTCGCCCGgc tACTCCACCT CCAacacggc 

151 gAATGGATAG GGAtgaCCGT CTTCGTCGTC CTCGGCATGC TCCAGTTCCA 

201 AGGCgcgatt tActccaacg cggtgGAacg taTGctcggt acggtcatcg 

251 ggctgGGCGC GGGTTTGGgc gTTTTATGGC TGAACCAGCA TTAtttccac 

301 ggcaacCTcc tcttctacct gaccatcggc acggcaagcg cactggccgg 

351 ctGGGCGGCG GTCGGCAAAA acggctacgt ccctatgctg GCGGGGctgA 

4 01 CGATGTGCAT gctcatcggc gACAACGGCA GCGAATGGCT CGACAGCGGC 

4 51 CTGATGCGCG CGAT GAACGT CCTCATCGGC GCCGCCATCG CCATTGCCGC 

501 CGCCAAACTG CTGCCGCTGA AATCCACACT GATGTGGCGT TTCATGCTTG 

551 CCGACAACCT GGCCGACTGC AGCAAAATGA TTGCCGAAAT CAGCAACGGC 

601 AGGCGTATGA CGCGCGAACG TTTGGAGCAG AATATGGTCA AAATGCGCCA 

651 AATCAACGCA CGCATGGTCA AAAGCCGCAG CCACCTCGCC GCCACATCGG 

7 01 GCGAAAGCCG CATCAGCCCC TCCATGATGG AAGCCATGCA GCACGCCCAC 

7 51 CGCAAAATCG TCAACACCAC CGAGCTGCTC CTGACCACCG CCGCCAAGCT 

8 01 GCAATCTCCC AAACTCAACG GCAGCGAAAT CCGGCTGCTC GACCGCCACT 
8 51 TCACACTGCT CCAAACCGAC CTGCAACAAA CCGCCGCCCT CATCAACGGC 
901 AGACACGCCC GCCGCATCCG CATCGACACC GCCATCAACC CCGAACTGGA 
951 AGCCCTCGCC GAACACCTCC ACTACCAATG GCAGGGCTTC CTCTGGCTCA 

1001 GCACCAATAT GCGTCAGGAA ATTTCCGCCC TCGTCATCCT GCTGCAACGC 
1051 ACCCGCCGCA AATGGCTGGA TGCCCACGAA CGCCAACACC TGCGCCAAAG 
1101 CCTGCTTGAA ACACGGGAAC ACGGCTGA 

This corresponds to the amino acid sequence <SEQ ID 638; ORF146ng-l>: 

1 MNSSQRKRLS GRWLNSYERY RHRRLIHAVR LGGTVLFATA LARLLHLQHG 

51 EW IGMTVFW LGMLQFQGA I YSNAVE RMLG TVIGLGAGLG VLWLN QHYFH 

101 GNLLFYLTIG TASALAGWAA VGKNGYVPML AGLTMCMLIG DNGSEWLDSG 

151 LMRAMN VLIG AAIAIAAAKL LPL KSTLMWR FMLADNLADC SKMIAEISNG 

201 RRMTRERLEQ NMVKMRQINA RMVKSRSHLA ATSGESRISP SMMEAMQHAH 

251 RKIVNTTELL LTTAAKLQSP KLNGSEIRLL DRHFTLLQTD LQQTAALING 

301 RHARRIRIDT A I N PE LEAL A EHLHYQWQGF LWLSTNMRQE ISALVILLQR 

351 TRRKWLDAHE RQHLRQSLLE TREHG* 

ORF146ng-l and ORF146-1 show 96.5% identity in 375 aa overlap 

orf 14 6-1 . pep MNTSQRNRLVSRWLNSYERYRYRRLIHAVRLGGAVLFATASARLLHLQHGEWIGMTVFVV 
I | : I I I : I I : I I I I II I I I I : I I I I I I I I I I I : I 1 M M II II I I I I I I I I I I I I I I I 
orfl4 6ng-l MNSSQRKRLSGRWLNSYERYRHRRLIHAVRLGGTVLFATALARLLHLQHGEWIGMTVFVV 

orf 1 4 6-1 . pep LGMLQFQGAI YSKAVERMLGTVIGLGAGLGVLWLNQHYFHGNLLFYLTVGTASALAGWAA 
I I I I I I I I I I I I : I II I I I II I I I I II I I I I I I I I I I I I I II I I I I I I : I I I i M I II M 
orfl4 6ng-l LGMLQFQGAIYSNAVERMLGTVIGLGAGLGVLWLNQHYFHGNLLFYLTIGTASALAGWAA 



CHIR-0160 (356.001) 



-373- 



PATENT 



orf 14 6-1 . pep VGKNGYVPMLAGLTMCMLIGDNGSEWLDSGLMRAMVLIGAAIAIAAAKLLPLKSTLMWR 
I I I I I I I I I I I I I I I I I I I I I 1 N I I 1 I M ! I I I I I I I I I I I I I I I I I I I 1 I I 1 1 I M M 
orfl4 6ng-l VGKNGYVPMLAGLTMCMLIGDNGSEWLDSGLMRAMNVLIGAAIAIAaAKLLPLKSTLMWR 

5 orf 146-1 .pep FMLADNLADC SKMI AE I SNGRRMTRERLEENMAKMRQ I NARMVKS RS HLAAT S GE SRI S P 

! I I I I I II I I I I I I I I I I I I I II I 1 1 I I I : I I : I I I I I 1 I 1 1 1 I 1 I I I I I I I I I I I I I II 
orfl46ng-l FMLADNLADC SKMI AE I SNGRRMTRERLEQNMVKMRQ IN ARMVKSRSHLAAT SGE S RI S P 



15 



orf 14 6-1. pep AMMEAMQHAHRKIVNTTELLLTTAAKLQSPKLNGSEIRLLDRHFTLLQTDLQQTVALING 
: I II I II I M M M M M I M I I I 11 I M M I I I I I I II I I II I I II I M I I I I : I I I I I 
orfl4 6ng-l SMMEAMQHAHRKIVNTTELLLTTAAKLQSPKLNGSEIRLLDRHFTLLQTDLQQTAALING 

orf 14 6-1. pep RHARRIRIDTAINPELEALAEHLHYQWQGFLWLSTNMRQEISALVILLQRTRRKWLDAHE 
I I I I I I I I I I I I I I II I I I I I I I I I i ! I I I I I I I II I I I I I I I I I I I I II I I I II II II ( 
orfl4 6ng-l RHARRIRIDTAINPELEALAEHLHYQWQGFLWLSTNMRQEISALVILLQRTRRKWLDAHE 



RQHLRQSLLETREHGX 
I II I I I I I I I I I I I I I 
RQHLRQSLLETREHGX 



orf 14 6-1 . pep 
orfl46ng-l 

20 Furthermore, ORF146ng-l shows homology with a hypothetical E.coli protein: 

sp| P33011 | YEEAJECOLI HYPOTHETICAL 40.0 KD PROTEIN IN COBO-SBMC INTERGENIC REGION 
>gi I 1736674 |gnl | PID|dl016553 (D90838) ORF_ID : o348#20 ; similar to [SwissProt 
Accession Number P33011] [Escherichia coli] >gi | 1736682 | gnl | PID | dl016560 (D90839) 
ORF_ID:o348#20; similar to [SwissProt Accession Number P33011] [Escherichia coli] 
25 >gi | 1788318 (AE000292) f352; 100% identical to fragment YEEA_ECOLI SW: P33011 but 

has 203 additional C-terminal residues [Escherichia coli] Length = 352 
Score = 109 bits (271), Expect = 2e-23 

Identities = 89/347 (25%), Positives = 150/347 (42%), Gaps = 21/347 (6%) 

30 Query: 20 YRHRRLIHAVRLGGTVLFATALARLLHLQHGEWIGMTVFWLGMLQFQGAIYSNAVERML 79 

YRH R++H R+ L + RL + W +T+ V++G + F G + A ER+ 
Sbjct: 15 YRHYRIVHGTRVALAFLLTFLIIRLFTIPESTWPLVTMWIMGPISFWGNWPRAFERIG 74 

Query: 80 GTVIGLGAGLGVLWLNQHYFHGNLLFYLTIGTASALAGWAAVGKNGYVPMLAGLTMCMLI 139 
35 GTV+G GL L L L + A L GW A+GK Y +L G+T+ +++ 

Sbjct: 75 GTVLGSILGLIALQLE LISLPLMLVWCAAAMFLCGWLALGKKPYQGLLIGVTLAIW 131 

Query: 140 GDNGSEWLDSGLMRAMNVLIGXXXXXXXXKLLPLKSTLMWRFMLADNLADCSKMIAEISN 199 
G E +D+ L R+ +V++G + P ++ + WR LA +L + +++ + 

40 Sbjct: 132 GSPTGE-IDTALWRSGDVILGSLLAMLFTGIWPQRAFIHWRIQLAKSLTEYNRVYQSAFS 190 

Query: 200 GRRMTRERLEQNMVKMRQINARWKSRSHLAATSGESRISPSMMEAMQHAHRKIVNXXXX 25 9 
+ R RLE ++ K+ VK R +A S E+RI S+ E +Q +R +V 

^ Sbjct: 191 PNLLERPRLESHLQKLL TDAVKMRGLIAPASKETRIPKSIYEGIQT INRNLVCMLEL 247 

Query: 2 60 XXXXXXXXQSPK LNGSEIRLLDRHFXXXXXXXXXXAALINGRHARRIRIDTAINPEL 316 

+ LN ++R D AL G +N + 

Sbjct: 248 QINAYWATRPSHFVLLNAQKLR— DTQHMMQQILLSLVHALYEGNPQPVFANTEKLNDAV 305 

50 Query: 317 EALAEHL— HYQWQ GFLWLSTNMRQEISALVILLQRTRRK 354 

E L + L H+ + G++WL+ ++ L L+ R RK 

Sbjct: 306 EELRQLLNNHHDLKWETPIYGYVWLNMETAHQLELLSNLICRALRK 352 

On the basis of this analysis, including the identification of several transmembrane domains in the 
gonococcal protein, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
55 their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 76 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 63 9> 



1 . . GCCGAAGACA CGCGCGTTAC CGCACAGCTT TTGAGCGCGT ACGGCATTCA 
51 GGGCAAACTC GTCAGTGTGC GCGAACACAA CGAACGGCAG ATGGCGGACA 
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101 AGATTGTCGG CTATCTTTCA GACGGCATGG TTGTGGCACA GGTTTCCGAT 

151 GCGGGTACGC CGGCCGTGTG CGACCCGGGC GCGAAACTCG CCCGCCGCGT 

201 GCGTGAGGCC GGGTTTAAAG TCGTTCCCGT CGTGGGCGCA AC . GCGGTGA 

251 TGGCGGCTTT GAGCGTGGCC GGTGTGGAAG GATCCGATTT TTATTTCAAC 

301 GGTTTTGTAC CGCCGAAATC GGGAGAACGC AGGAAACTGT TTGCCAAATG 

351 GGTGCGGGCG GCGTTTCCTA TCGTCATGTT TGAAACGCCG CACCGCATCG 

4 01 GTGCAGCGCT TGCCGATATG GCGGAACTGT TCCCCGAACG CCGATTAATG 

451 CTGGCGCGCG AAATTACGAA AACGTTTGAA ACGTTCTTAA GCGGCACGGT 

501 TGGGGAAATT CAGACGGCAT TGTCTGCCGA CGGCGACCAA TCGCGCGGCG 

551 AGATGGTGTT GGTGCTTTAT CCGGCGCAGG ATGAAAAACA CGAAGGCTTG 

601 TCCGAGTCCG CGCAAAACAT CATGAAAATC CTCACAGCCG AGCTGCCGAC 

651 CAAACAGGCG GCGGAGCTTG CTGCCAAAAT CACGGGCGAG GGAAAGAAAG 

701 CTTTGTACGA T. . 

This corresponds to the amino acid sequence <SEQ ID 640; ORF147>: 



1 . . AEDTRVTAQL LSAYGIQGKL VSVREHNERQ MADKIVGYLS DGMVVAQVSD 

51 AGTPAVCDPG AKLARRVREA GFKWPWGA XAVMAALSVA GVEGSDFYFN 

101 GFVPPKSGER RKLFAKWVRA AFPIVMFETP HRIGAALADM AELFPERRLM 

151 LAREITKTFE TFLSGTVGEI QTALSADGDQ SRGEMVLVLY PAQDEKHEGL 

201 SESAQNIMKI LTAELPTKQA AELAAKITGE GKKALYD . . 

Further work revealed the complete nucleotide sequence <SEQ ID 64 1>: 



1 ATGTTTCAGA AACATTTGCA GAAAGCCTCC GACAGCGTCG TCGGAGGGAC 

51 ATTATACGTG GTTGCCACGC CCATCGGCAA TTTGGCGGAC ATTACCCTGC 

101 GCGCTTTGGC GGTATTGCAA AAGGCGGACA TCATCTGTGC CGAAGACACG 

151 CGCGTTACCG CACAGCTTTT GAGCGCGTAC GGCATTCAGG GCAAACTCGT 

201 CAGTGTGCGC GAACACAACG AACGGCAGAT GGCGGACAAG ATTGTCGGCT 

251 ATCTTTCAGA CGGCATGGTT GTGGCACAGG TTTCCGATGC GGGTACGCCG 

301 GCCGTGTGCG ACCCGGGCGC GAAACTCGCC CGCCGCGTGC GTGAGGCCGG 

351 GTTTAAAGTC GTTCCCGTCG TGGGCGCAAG CGCGGTGATG GCGGCTTTGA 

401 GCGTGGCCGG TGTGGAAGGA TCCGATTTTT ATTTCAACGG TTTTGTACCG 

451 CCGAAATCGG GAGAACGCAG GAAACTGTTT GCCAAATGGG TGCGGGCGGC 

501 GTTTCCTATC GTCATGTTTG AAACGCCGCA CCGCATCGGT GCGACGCTTG 

551 CCGATATGGC GGAACTGTTC CCCGAACGCC GATTAATGCT GGCGCGCGAA 

601 ATTACGAAAA CGTTTGAAAC GTTCTTAAGC GGCACGGTTG GGGAAATTCA 

651 GACGGCATTG TCTGCCGACG GCAACCAATC GCGCGGCGAG ATGGTGTTGG 

7 01 TGCTTTATCC GGCGCAGGAT GAAAAACACG AAGGCTTGTC CGAGTCCGCG 
751 CAAAACATCA TGAAAATCCT CACAGCCGAG CTGCCGACCA AACAGGCGGC 

8 01 GGAGCTTGCT GCCAAAATCA CGGGCGAGGG AAAGAAAGCT TTGTACGATC 
851 TGGCTCTGTC TTGGAAAAAC AAATAG 

This corresponds to the amino acid sequence <SEQ ID 642; ORF147-l>: 



1 MFQKHLQKAS DSWGGTLYV VATPIGNLAD ITLRALAVLQ KADIICAEDT 

51 RVTAQLLSAY GIQGKLVSVR EHNERQMADK IVGYLSDGMV VAQVSDAGTP 

101 AVCDPGAKLA RRVRE AG F KV VPWGASAVM AAL SVA GVEG SDFYFNGFVP 

151 PKSGERRKLF AKWRAAFPI VMFETPHRIG ATLADMAELF PERRLMLARE 

201 ITKTFETFLS GTVGEIQTAL SADGNQSRGE MVLVLYPAQD EKHEGLSESA 

251 QNIMKILTAE L PTKQAAE LA AKITGEGKKA LYDLALSWKN K* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with hypothetical protein ORF286 of E.coli (accession number Ul 8997) 
ORF147 and E.coli ORF286 protein show 36% aa identity in 237aa overlap: 

Orfl47: 1 AEDTRVTAQLLSAYGIQGKLVSVREHNERQMADKIVGYLSDGMWAQVSDAGTPAVCDPG 60 

AEDTR T LL +GI +L ++ +HNE+Q A+ ++ L +G +A VSDAGTP + DPG 
Orf286: 43 AEDTRHTGLLLQHFGINARLFALHDHNEQQKAETLLAKLQEGQNIALVSDAGTPLINDPG 10 

Orfl47: 61 AKLARRVREXXXXXXXXXXXXXXXXXXXXXXXEGSDFYFNGFVPPKSGERRKLFAKWVRA 12 

L R RE F + GF+P KS RR 

Orf286: 103 YHLVRTCREAGIRWPLPGPCAAITALSAAGLPSDRFCYEGFLPAKSKGRRDALKAIEAE 16 

Orfl47: 121 AFPIVMFETPHRIGAALADMAELFPERR-LMLAREITKTFETFLSGTVGEIQTALSADGD 17 

++ +E+ HR+ +L D+ + E R ++LARE+TKT+ET VGE+ + D + 

Orf286: 163 PRTLIFYESTHRLLDSLEDIVAVLGESRYWLARELTKTWETIHGAPVGELLAWVKEDEN 22 



CHIR-0160 (356.001) 



-375- 



PATENT 



nT . f1 47 . 180 OSRGEM VLVLYPAQDEKHEGLSESAQNIMKILTAELPTKQAAELAAKITGEGKKALY 236 

+ +GEMVL++ + E L A + +L AELP K+AA LAA+I G K ALY 

Orf286: 223 RRKGEMVLIV-EGHKAQEEDLPADALRTLALLQAELPLKK7AAALAAEIHGVKKNALY 278 

Homology with a predicted ORF from N. meningitidis (strain A) 

OKF147 shows 96.6% identity over a 237aa overlap with ORF75a from strain A of N. meningitidis: 

10 20 30 

^-i an „„„ AEDTRVTAQLLSAYGIQGKLVSVREHNERQ 
in ' P P I I 1 I M I I I I I I I I I I I I I I I I 

orf7 5a TLYVVATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAYGIQGKLVSVREHNERQ 
20 30 40 50 60 70 

40 50 60 70 80 90 

1< orfl4 7 pep m n niyn rrrv T , q n PMW a OV S "D AGT PAVC D PGAK L ARRVRE AG FK W P VVG AXAVMAAL S VA 

| | | | | | I I I Ill I I I I I I I I I I I I I I I I : I I 1 I I I I I I MM 

orf 7 5a MnnKT^^YT.snGMVVAOVSDAGTPAVCDPGAKLARRVREVGFK VVPVVGASAVMAALSVA 
80 90 100 110 120 130 

20 100 110 120 130 140 150 

orfl47 pep GVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIVMFETPHRIGAALADMAELFPERRLM 

|| I I I I I I I I I M I I I I I I I I I = I I I : i I I I I I I I I M : I i I II I M II II II 

orf 75a GVAGSDFYFNGFVPPKSGERRKLFAKWVRVAFPVVMFETPHRIGATLADMAELFPERRLM 
140 150 160 170 180 190 

25 

160 170 180 190 200 210 

orf 147 pep LAREITKTFETFLSGTVGEIQTALSADGDQSRGEMVLVLYPAQDEKHEGLSESAQNIMKI 
| | | | | | | I I II M ! II II II II M : I I I = M I I I I I I I I I II I II I I I I I I I I I I I I II I 
orf 75a LAREITKTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQDEKHEGLSESAQNIMKI 
30 200 210 220 230 240 250 

220 230 
orf 14 7 .pep LTAELPTKQAAELAAKITGEGKKALYD 
I I I I I I I I I I I I II I I I I I II I I I I I I 
35 orf 7 5a LTAELPTKQAAELAAKITGEGKKALYDLALSWKNKX 

260 270 280 290 

ORF147a is identical to ORF75a, which includes aa 56-292 of ORF75. 
Homology with a predicted ORF from N.gonorrhoeae 

ORF 147 shows 94.1% identity over a 237aa overlap with a predicted ORF (ORF147ng) from N. 
40 gonorrhoeae: 

orf 147 pep AEDTRVTAQLLSAYGIQGKLVSVREHNERQ 3 0 

II II II II II II II II II : II II II II I I I 
orfl47ng TLYVVATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAYGIQGRLVSVREHNERQ 85 

45 orf 147. pep MADKIVGYLSDGMWAQVSDAGTPAVCDPGAKLARRVREAGFKWPWGAXAVMAALSVA 90 

orfl47ng ^DKVIGFLSDGLWAQVSDAGTPAVCDPGAKljARRVREAGFKVVPVVGASAVMAALSVA 145 

orf 147 .pep GVEGSDFYFNGFVPPKSGERRKLFAKWVRAAFPIVMFETPHRIGAALADMAELFPERRLM 150 
50 II I I I I ! I I I I I I I M I I I II I I I I II I II I : I II I I II I I I I : I I I I I I I I I I I I M 

orf 147ng GVAESDFYFNGFVPPKSGERRKLFAKWVRAAFPWMFETPHRIGATLADMAELFPERRLM 205 

orf 147 .pep LAREITKTFETFLSGTVGEIQTALSADGDQSRGEMVLVLYPAQDEKHEGLSESAQNIMKI 210 
I I I I I II I I I I II II II I I II I II : I I I = I I I I I I I II I II I 1 1 I I I II I II II II III 
55 orfl47ng LAREITKTFETFLSGTVGEIQTALAADGNQSRGEMVLVLYPAQDEKHEGLSESAQNAMKI 265 

orf 147. pep LTAELPTKQAAELAAKITGEGKKALYD 237 

orfl47ng LAAE L PTKQAAE LAAKI T GE GKKALYDLAL S WKNK 300 
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An ORF147ng nucleotide sequence <SEQ ID 643> was predicted to encode a protein having amino 
acid sequence <SEQ ID 644>: 

1 MSVFQTAFFM FQKHLQKASD SWGGTLYW ATPIGNLADI TLRALAVLQK 

51 ADIICAEDTR VTAQLLSAYG IQGRLVSVRE HNERQMADKV IGFLSDGLW 

5 101 AQVSDAGTPA VCDPGAKLAR RVREAGFK W PWGASAVMA ALSVA GVAES 

151 DFYFNGFVPP KSGERRKLFA KWVRAAFPW MFETPHRIGA TLADMAELFP 

2 01 ERRLMLAREI TKTFETFLSG TVGEIQTALA ADGNQSRGEM VLVLYPAQDE 

251 KHEGLSESAQ NAMKILAAEL PTKQAAELAA KITGEGKKAL YDLALSWKNK 

301 * 

1 0 Further work revealed the following gonococcal DNA sequence <SEQ ID 645>: 

1 ATGTTTCAGA AACACTTGCA GAAAGCCTCC GACAGCGTCG TCGGAGGGAC 

51 ATTATACGTG GTTGCCACGC CCATCGGCAA TTTGGCAGAC ATTACCCTGC 

101 GCGCTTTGGC GGTATTGCAA AAGGCGGACA TCATTTGTGC CGAAGACACG 

151 CGCGTTACTG CGCAGCTTTT GAGCGCGTAC GGCATTCAGG GCAGGTTGGT 

15 201 CAGTGTGCGC GAACACAACG AGCGGCAGAT GGCGGACAAG GTAATCGGTT 

251 TCCTTTCAGA CGGCCTGGTT GTGGCGCAGG TTTCCGATGC GGGTACGCCG 

301 GCCGTGTGCG ACCCGGGCGC GAAACTCGCC CGCCGCGTGC GCGAAGCAGG 

351 GTTCAAAGTC GTTCCCGTCG TGGGCGCAAG CGCGGTAATG GCGGCGTTGA 

401 GTGTGGCCGG TGTGGCGGAA TCCGATTTTT ATTTCAACGG TTTTGTACCG 

20 451 CCGAAATCGG GCGAACGTAG GAAATTGTTT GCCAAATGGG TGCGGGCGGC 

501 ATTTCCTGTC GTCATGTTTG AAACGCCGCA CCGAATCGGG GCAACGCTTG 

551 CCGATATGGC GGAATTGTTC CCCGAACGCC GTCTGATGCT GGCGCGCGAA 

601 AT C AC G AAAA CGTTTGAAAC GTTCTTAAGC GGCACGGTTG GGGAAATTCA 

651 GACGGCATTG GCGGCGGACG GCAACCAATC GCGCGGCGAG ATGGTGTTGG 

25 701 TGCTTTATCC GGCGCAGGAT GAAAAACACG AAGGCTTGTC CGAGTCTGCG 

751 CAAAATGCGA TGAAAATCCT TGCGGCCGAG CTGCCGACCA AGCAGGCGGC 

801 GGAGCTTGCC GCCAAGATTA CAGGTGAGGG CAAAAAGGCT TTGTACGATT 

851 TGGCACTGTC GTGGAAAAAC AAATGA 

This corresponds to the amino acid sequence <SEQ ID 646; ORF147ng-l>: 

30 1 MFQKHLQKAS DSVVGGTLYV VATPIGNLAD ITLRALAVLQ KADIICAEDT 

51 RVTAQLLSAY GIQGRLVSVR EHNERQMADK VIGFLSDGLV VAQVSDAGTP 

101 AVCDPGAKLA RRVREAGFK V VPVVGASAVM AALSVA GVAE SDFYFNGFVP 

151 PKSGERRKLF AKWVRAAFPV VMFETPHRIG ATLADMAELF PERRLMLARE 

201 ITKT FETFLS GTVGEIQTAL AADGNQSRGE MVLVLYPAQD EKHEGLSESA 

35 251 QNAMKILAAE LPTKQAAELA AKITGEGKKA LYDLALSWKN K* 

ORF147ng shows homology to a hypothetical E.coli protein: 

sp I P45528 | YRAL_ECOLI HYPOTHETICAL 31.3 KD PROTEIN IN AGAI-MTR INTERGENIC REGION 
(F286) 

>gi I 606086 (U18997) ORF_f286 [Escherichia coli] 
40 >gi | 1789535 (AE000395) hypothetical 31.3 kD protein in agai-mtr intergenic region 

[Escherichia coli] Length = 286 
Score = 218 bits (550), Expect = 3e-56 

Identities = 128/284 (45%), Positives = 171/284 (60%), Gaps = 4/284 (1%) 

KHLQKASDSVVGGTLYWATPIGNLADITLRALAVLQKADIICAEDTRVTAQLLSAYGIQ 63 
K Q A +S G LY+V TPIGNLADIT RAL VLQ D+I AEDTR T LL +GI 
KQHQSADNSQ — GQLYIVPTPIGNLADITQRALEVLQAVDLIAAEDTRHTGLLLQHFGIN 59 

GRLVSVREHNERQMADKVIGFLSDGLWAQVS DAGTPAVCDPGAKLARRVREAGFKVVPV 123 

RL ++ +HNE+Q A+ ++ L +G +A VSDAGTP + DPG L R REAG +VVP+ 
ARLFALHDHNEQQKAETLLAKLQEGQNIALVSDAGTPLINDPGYHLVRTCREAGIRWPL 119 

VGASAVMAALSVAGVAESDFYFNGFVPPKSGERRKLFAKWVRAAFPWMFETPHRIGATL 183 
G A + ALS AG+ F + GF+P KS RR ++ +E+ HR+ +L 



: 243 HEGLSESAQNAMKILAAELPTKQAAELAAKITGEGKKALYDLAL 286 



Query: 


4 


Sbjct: 


2 


Query: 


64 


Sbjct: 


60 


Query: 


124 


Sbjct: 


120 


Query: 


184 


Sbjct: 


180 


Query: 


243 
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EL A + +L AELP K+AA LAA+I G K ALY AL 
Sbjct: 239 EEDLPADALRTLALLQAELPLKKAAALAAEIHGVKKNALYKYAL 282 

Based on the computer analysis and the presence of a putative transmembrane domain in the 
gonococcal protein, it is predicted that these proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 77 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 647> 

1 ATGAAAACAA CCGACAAACG GACAACCGAA ACACACCGCA AAGCCCCGAA 

51 AACCGGTCGC ATCCGCTTCT C.GCTGCTTA CTTAGCCATA TGCCTGTCGT 

101 TCGGCATTCT TCCCCAAGCC TGGGCGGGAC ACACTTATTT CGGCAT CAAC 

151 TACCAATACT ATCGCGACTT TGCCGAAAAT AAAGGCAAGT TTGCAGTCGG 

201 GGCGAAAGAT ATTGAGGTTT ACAACAAAAA AGGGGAGTTG GTCGGCAAAT 

251 CAATGACAAA AGCCCCGATG ATTGATTTTT CTGTGGTGTC GCGTAACGGC 

301 GTGGCGGcAT TGGTGGGCGt AfCAATATAT TGTGAGCGTG GCACATAACG 

351 GCGGCTATAA CAACGTTGAT TTTGGTGCGG AAGGAAk.AA tATCCC . GAT 

4 01 CAACAwCGww TTACTTATAA AATTGTGAAA CGGAATAATT ATAAAGCAGG 

451 GACTAAAGGC CATCCTTATG GCGGCGATTA TCATATGCCG CGTTTGCATA 

501 AATwTGTCAC AGATGCAGAA CCTGTTGAAA TGACCAGTTA TATGGATGGG 

551 CGGAAATATA TCGATCAAAA TAATTACCCT GACCGTGTTC GTATTGGGGC 

601 AGGCAGGCAA TATTGGCGAT CTGATGAAGA TGAGCCCAAT AACCGCGAAA 

651 GT T CAT AT C A TATTGCAAGT 

701 GGCTC ACCAATGTTT ATCTATGATG CCCAAAAGCA 

751 AAAGTGGTTA ATTAATGGGG TATTGCAAAC GGGCAACCCC TATATAGGAA 

801 AAAGCAATGG CTTCCAGCTG GTTCGTAAAG ATTGGTTCTA TGATGAAATC 

851 TTTGCTGGAG ATACCCATTC AGTATTCTAC GAACCACGTC AAAAT GGGAA 

901 ATACTCTTTT AACGACGATA ATAATGGCAC AGGAAAAATC AATGCCAAAC 

951 ATGAACACAA TTCTCTGCCT AATAGATTAA AAACACGAAC CGTTCAATTG 

1001 TTTAATGTTT CTTTATCCGA GACAGCAAGA GAACCTGTTT ATCATGCTGC 

1051 AGGTGGTGTC AACAGTTATC GACCCAGACT GAATAATGGA G AAAAT AT T T 

1101 CCTTTATTGA CGAAGGAAAA GGCGAATTGA TACTTACCAG CAACATCAAT 

1151 CAAGGTGCTG GAGGATTATA TTTCCAAGGA GATTTTACGG TCTCGCCTGA 

1201 AAATAACGAA ACTTGGCAAG GCGCGGGCGT TCATATCAGT GAAGACAGTA 

1251 CCGTTACTTG GAAAGTAAAC GGCGTGGCAA ACGACCGCCT GTCCAAAATC 

1301 GGCAAAGGCA CGCTG 

// 

2101 GATAAAG 

2151 TGACTGCTTC ATTGACTAAG ACCGACATCA GCGGCAATGT CGATCTTGCC 

2201 GATCACGCTC ATTTAAATCT CACAGGGCTT GCCACACTCA ACGGCAATCT 

2251 TAGTGCAAAT GGCGATACAC GTTATACAGT CAGCCACAAC GCCACCCAAA 

2301 ACGGCAACCk TAgCCtCGtG G.sAATGcCC AAGCAACATT TAATCAAGCC 

2351 ACATTAAACG GCAACACATC GGCTTCgGGC AATGCTTCAT TTAATCTAAG 

2401 CGACCACGCC GTACAAAACG GCAGTCTGAC GCTTTCCGGC AACGCTAAGG 

2451 CAAACGTAAG CCATTCCGCA CTCAACGGTA ATGTCTCCCT AGCCGATAAG 

2501 GCAGTATTCC ATTTTGAAAG CAGCCGCTTT ACCGGACAAA TCAGCGGCGG 

2551 CAagGATACG GCATTACACT TAAAAGACAG CGAATGGACG CTGCCGTCAg 

2 601 GarCGGAATT AGGCAATTTA AACCTTGACA ACGCCACCAT TACaCTCAAT 

2 651 TCCGCCTATC GCCACGATGC GGCAGGGGCG CAAACCGGCA GTGCGACAGA 

27 01 TGCGCCGCGC CGCCGTTCGC GCCGTTCGCG CCGTTCCCTA TTATmCGTTA 

2751 CACCGCCAAC TTCGGTAGAA TCCCGTTTCA ACACGCTGAC GGTAAACGGC 

2801 AAATTGAACG GTCAGGGAAC ATTCCGCTTT ATGTCGGAAC TCTTCGGCTA 

2851 CCGCAGCGAC AAATTGAAGC TGGCGGAAAG TTCCGAAGGC ACTTACACCT 

2 901 TGGCGGTCAA CAATACCGGC AACGAACCTG CAAGCCTCGA ACAATTGACG 

2951 GTAGTGGAAG GAAAAGACAA CAAACCGCTG TCCGAAAACC TTAATTTCAC 

3001 CCTGCAAAAC GAACACGTCG ATGCAGGCGC GTGG 

// 

3551 TTAGAC CGCGTATTTG CCGAAGACCG 

3601 CCGCAACGCC GTTTGGACAA GCGGCATCCG GGACACCAAA CACTACCGTT 

3651 CGCAAGATTT CCGCGCCTAC CGCCAACAAA CCGACCTGCG CCAAATCGGT 

37 01 ATGCAGAAAA ACCTCGGCAG CGGGCGCGTC GGCATCCTGT TTTCGCACAA 

37 51 CCGGACCGAA AACACCTTCG ACGACGGCAT CGGCAACTCG GCACGGCTTG 

3801 CCCACGGCGC CGTTTTCGGG CAATACGGCA TCGACAGGTT CTACATCGGC 
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3851 ATCAGnCGCG GGCGCGGGTT TTAGCAGCGG CAGCCTTTcA GACGGCATCG 

3 901 GAGsmAAAwT CCGCCGCCGC GTGCtGCATT ACGGCATTCA GGCACGAtAC 
3951 CGCGCCGgtt tCggCGgATt CGGCATCGAA CCGCACATCG GCGCAACGCg 
4001 ctATTTCGTC CAAAAAGCGG ATTACCGCTA CGAAAACGTC AATATCGCCA 
4051 CCCCCGGCCT TGCATTCAAC CGcTACCGCG CGGGCATTAa GGCAGATTAT 
4101 TCATTCAAAC CGGCGCAACA CATTTCCATC ACGCCTTATT TGAGCCTGTC 
4151 CTATACCGAT GCCGCTTCGG GCAAAGTCCG AACACGCGTC AATACCGCCG 
4201 TATTGGCTCA GGATTTCGGC AAAACCCGCA GTGCGGAATG GGgCGTAAAC 
4251 GCCGAAATCA AAGGTTTCAC GCTGTCCCTC CACGCTGCCG CCGCCAAAGG 

4 301 CCCGCAACTG GAAGCGCAAC ACAGCGCGGG CATCAAATTA GGCTACCGCT 
4351 GGTAA... 

This corresponds to the amino acid sequence <SEQ ID 648; ORFl>: 



1 MKTTDKRTTE THRKAPKTGR IRFXAAYLAI CLSFGILPQA WAGHTYFGIN 
51 YQYYRDFAEN KGKFAVGAKD IEVYNKKGEL VGKSMTKAPM IDFSWSRNG 

101 VAALVGVQYI VSVAHNGGYN NVDFGAEGXN IXDQXRXTYK IVKRNNYKAG 
151 TKGHPYGGDY HMPRLHKXVT DAEPVEMTSY MDGRKYIDQN NYPDRVRIGA 

201 GRQYWRSDED EPNNRESSYH IAS GS PMFIYDAQKQ 

251 KWLINGVLQT GNPYIGKSNG FQLVRKDWFY DEIFAGDTHS VFYEPRQNGK 

301 YSFNDDNNGT GKINAKHEHN SLPNRLKTRT VQLFNVSLSE TARE PVYHAA 

351 GGVNSYRPRL NNGENISFID EGKGELILTS NINQGAGGLY FQGDFTVSPE 

401 NNETWQGAGV HISEDSTVTW KVNGVANDRL SKIGKGTL 

// 

701 DKVTAS LTKTDISGNV DLADHAHLNL TGLATLNGNL 

751 SANGDTRYTV SHNATQNGNX SLVXNAQATF NQATLNGNTS ASGNASFNLS 
801 DHAVQNGSLT LSGNAKANVS HSALNGNVSL ADKAVFHFES SRFTGQISGG 

851 KDTALHLKDS EWTLPSGXEL GNLNLDNAT I TLNSAYRHDA AGAQTGSATD 

901 APRRRSRRSR RSLLXVTPPT SVESRFNTLT VNGKLNGQGT FRFMSELFGY 

951 RSDKLKLAES SEGTYTLAVN NTGNEPASLE QLTWEGKDN KPLSENLNFT 

1001 LQNEHVDAGA W 

// 

H51 LDRVFAEDR 

1201 RNAVWTSGIR DTKHYRSQDF RAYRQQTDLR QIGMQKNLGS GRVGILFSHN 

1251 RTENTFDDGI GNSARLAHGA VFGQYGIDRF YIGISAGAGF SSGSLSDGIG 

1301 XKXRRRVLHY GIQARYRAGF GGFGIEPHIG ATRYFVQKAD YRYENVNIAT 

1351 PGLAFNRYRA GIKADYSFKP AQHISITPYL SLSYTDAASG KVRTRVNTAV 

14 01 LAQDFGKTRS AEWGVNAE I K GFTLSLHAAA AKGPQLEAQH SAGIKLGYRW 

1451 * 

Further sequencing analysis revealed the complete nucleotide sequence <SEQ ID 649>: 

1 ATGAAAACAA CCGACAAACG GACAACCGAA ACACACCGCA AAGCCCCGAA 

51 AACCGGCCGC ATCCGCTTCT CGCCTGCTTA CTTAGCCATA TGCCTGTCGT 

101 TCGGCATTCT TCCCCAAGCC TGGGCGGGAC ACACTTATTT CGGCATCAAC 

151 TACCAATACT ATCGCGACTT TGCCGAAAAT AAAGGCAAGT TTGCAGTCGG 

201 GGCGAAAGAT ATTGAGGTTT ACAACAAAAA AGGGGAGTTG GTCGGCAAAT 

251 CAATGACAAA AGCCCCGATG ATTGATTTTT CTGTGGTGTC GCGTAACGGC 

301 GTGGCGGCAT TGGTGGGCGA TCAATATATT GTGAGCGTGG CACATAACGG 

351 CGGCTATAAC AACGTTGATT TTGGTGCGGA AGGAAGAAAT CCCGATCAAC 

4 01 ATCGTTTTAC TTATAAAATT GTGAAACGGA ATAATTATAA AGCAGGGACT 

4 51 AAAGGCCATC CTTATGGCGG CGATTATCAT ATGCCGCGTT TGCATAAATT 

501 TGTCACAGAT GCAGAACCTG TTGAAATGAC CAGTTATATG GATGGGCGGA 

551 AATATATCGA TCAAAATAAT TACCCTGACC GTGTTCGTAT TGGGGCAGGC 

601 AGGCAATATT GGCGATCTGA TGAAGATGAG CCCAATAACC GCGAAAGTTC 

651 ATATCATATT GCAAGTGCGT ATTCTTGGCT CGTTGGTGGC AATACCTTTG 

701 CACAAAATGG ATCAGGTGGT GGCACAGTCA ACTTAGGTAG TGAAAAAATT 

751 AAACATAGCC CATATGGTTT TTTACCAACA GGAGGCTCAT TTGGCGACAG 

801 TGGCTCACCA AT GT T TAT CT ATGATGCCCA AAAGCAAAAG TGGTTAATTA 

851 ATGGGGTATT GCAAACGGGC AACCCCTATA TAGGAAAAAG CAATGGCTTC 

901 CAGCTGGTTC GTAAAGATTG GTTCTATGAT GAAATCTTTG CTGGAGATAC 

951 CCATTCAGTA TTCTACGAAC CACGT CAAAA TGGGAAATAC TCTTTTAACG 

1001 ACGATAATAA TGGCACAGGA AAAATCAATG CCAAACATGA ACACAATTCT 

1051 CTGCCTAATA GATTAAAAAC ACGAACCGTT CAATTGTTTA ATGTTTCTTT 

1101 ATCCGAGACA GCAAGAGAAC CTGTTTATCA TGCTGCAGGT GGTGTCAACA 

1151 GTTATCGACC CAGACT GAAT AAT GGAGAAA ATATTTCCTT TATTGACGAA 

1201 GGAAAAGGCG AATTGATACT TACCAGCAAC ATCAATCAAG GTGCTGGAGG 

1251 ATTATATTTC CAAGGAGATT TTACGGTCTC GCCTGAAAAT AACGAAACTT 

1301 GGCAAGGCGC GGGCGTTCAT ATCAGTGAAG ACAGTACCGT TACTTGGAAA 

1351 GTAAACGGCG TGGCAAACGA CCGCCTGTCC AAAATCGGCA AAGGCACGCT 
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1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 
2751 
2801 
2851 
2901 
2951 
3001 
3051 
3101 
3151 
3201 
3251 
3301 
3351 
3401 
3451 
3501 
3551 
3601 
3651 
3701 
3751 
3801 
3851 
3901 
3951 
4001 
4051 
4101 
4151 
4201 
4251 
4301 
4351 



GCACGTTCAA GCCAAAGGGG 7A7AAACC7AAGG CTCGATCAGC GTGGGCGACG 
GT ACAGT CAT TTTGGATCAG CAGGCAGACG ATAAAGGCAA AAAACAAGCC 
TTTAGTGAAA TCGGCTTGGT CAGCGGCAGG GGTACGGTGC AACTGAATGC 
CGATAATCAG TTCAACCCCG ACAAACTCTA TTTCGGCTTT CGCGGCGGAC 
GTTTGGATTT AAACGGGCAT TCGCTTTCGT TCCACCGTAT TCAAAATACC 
GATGAAGGGG CGATGATTGT CAACCACAAT CAAGACAAAG AATCCACCGT 
TACCATTACA GGCAATAAAG ATATTGCTAC AACCGGCAAT AACAACAGCT 
TGGATAGCAA AAAAGAAATT GCCTACAACG GTTGGTTTGG CGAGAAAGAT 
ACGACCAAAA CGAACGGGCG GCTCAACCTT GTTTACCAGC CCGCCGCAGA 
AGACCGCACC CTGCTGCTTT CCGGCGGAAC AAATTTAAAC GGCAACATCA 
CGCAAACAAA CGGCAAACTG TTTTTCAGCG GCAGACCAAC ACCGCACGCC 
TACAATCATT TAAACGACCA TTGGTCGCAA AAAGAGGGCA TTCCTCGCGG 
GGAAATCGTG TGGGACAACG ACTGGATCAA CCGCACATTT AAAGCGGAAA 
ACTTCCAAAT TAAAGGCGGA CAGGCGGTGG TTTCCCGCAA TGTTGCCAAA 
GTGAAAGGCG ATTGGCATTT GAGCAATCAC GCCCAAGCAG TTTTTGGTGT 
CGCACCGCAT CAAAGCCACA CAATCTGTAC ACGTTCGGAC TGGACGGGTC 
TGACAAATTG TGTCGAAAAA ACCATTACCG AC G AT AAAGT GATTGCTTCA 
TTGACTAAGA CCGAC AT CAG CGGCAATGTC GATCTTGCCG ATCACGCTCA 
TTTAAATCTC ACAGGGCTTG CCACACTCAA CGGCAATCTT AGTGCAAATG 
GCGATACACG TTATACAGTC AGCCACAACG CCACCCAAAA CGGCAACCTT 
AGCCTCGTGG GCAATGCCCA AGCAACATTT AATCAAGCCA CAT T AAACGG 
CAACACATCG GCTTCGGGCA ATGCTTCATT TAATCTAAGC GACCACGCCG 
TACAAAACGG CAGTCTGACG CTTTCCGGCA ACGCTAAGGC AAACGTAAGC 
CATTCCGCAC TCAACGGTAA TGTCTCCCTA GCCGATAAGG C AGTATT CCA 
TTTTGAAAGC AGCCGCTTTA CCGGACAAAT CAGCGGCGGC AAGGATACGG 
CATTACACTT AAAAGACAGC GAATGGACGC TGCCGTCAGG CACGGAATTA 
GGCAATTTAA ACCTTGACAA CGCCACCATT ACACTCAATT CCGCCTATCG 
CCACGATGCG GCAGGGGCGC AAACCGGCAG TGCGACAGAT GCGCCGCGCC 
GCCGTTCGCG CCGTTCGCGC CGTTCCCTAT TATCCGTTAC ACCGCCAACT 
TCGGTAGAAT CCCGTTTCAA CACGCTGACG GTAAACGGCA AATT GAACGG 
TCAGGGAACA TTCCGCTTTA TGTCGGAACT CTTCGGCTAC CGCAGCGACA 
AATTGAAGCT GGCGGAAAGT TCCGAAGGCA CTTACACCTT GGCGGTCAAC 
AATACCGGCA ACGAACCTGC AAGCCT CGAA CAATTGACGG TAGTGGAAGG 
AAAAGACAAC AAACCGCTGT CCGAAAACCT TAATTTCACC CTGCAAAACG 
AACACGTCGA TGCCGGCGCG TGGCGTTACC AACTCATCCG CAAAGACGGC 
GAGTTCCGCC TGCATAATCC GGT CAAAGAA CAAGAGCTTT CCGACAAACT 
CGGCAAGGCA GAAGCCAAAA AACAGGCGGA AAAAGACAAC GCGCAAAGCC 
TTGACGCGCT GATTGCGGCC GGGCGCGATG CCGTCGAAAA GACAGAAAGC 
GTTGCCGAAC CGGCCCGGCA GGCAGGCGGG GAAAATGTCG GCATTATGCA 
GGCGGAGGAA GAGAAAAAAC GGGTGCAGGC GGATAAAGAC ACCGCCTTGG 
CGAAACAGCG CGAAGCGGAA ACCCGGCCGG CTACCACCGC CTTCCCCCGC 
GCCCGCCGCG CCCGCCGGGA TTTGCCGCAA CTGCAACCCC AACCGCAGCC 
CCAACCGCAG CGCGACCTGA TCAGCCGTTA TGCCAATAGC GGTTTGAGTG 
AATTTTCCGC CACGCTCAAC AGCGTTTTCG CCGTACAGGA CGAATTAGAC 
CGCGTATTTG CCGAAGACCG CCGCAACGCC GTTTGGACAA GCGGCATCCG 
GGACACCAAA CACTACCGTT CGCAAGATTT CCGCGCCTAC CGCCAACAAA 
CCGACCTGCG CCAAATCGGT ATGCAGAAAA ACCTCGGCAG CGGGCGCGTC 
GGCATCCTGT TTTCGCACAA CCGGACCGAA AACACCTTCG ACGACGGCAT 
CGGCAACTCG GCACGGCTTG CCCACGGCGC CGTTTTCGGG CAATACGGCA 
TCGACAGGTT CTACATCGGC ATCAGCGCGG GCGCGGGTTT TAGCAGCGGC 
AGCCTTTCAG ACGGCATCGG AGGCAAAATC CGCCGCCGCG TGCTGCATTA 
CGGCATTCAG GCACGATACC GCGCCGGTTT CGGCGGATTC GGCATCGAAC 
CGCACATCGG CGCAACGCGC TATTTCGTCC AAAAAGCGGA TTACCGCTAC 
GAAAACGTCA ATATCGCCAC CCCCGGCCTT GCATTCAACC GCTACCGCGC 
GGGCATTAAG GCAGATTATT CATTCAAACC GGCGCAACAC ATTTCCATCA 
CGCCTTATTT GAGCCTGTCC TATACCGATG CCGCTTCGGG CAAAGTCCGA 
ACACGCGTCA ATACCGCCGT ATTGGCTCAG GATTTCGGCA AAACCCGCAG 
TGCGGAATGG GGCGTAAACG CCGAAATCAA AGGTTTCACG CTGTCCCTCC 
ACGCTGCCGC CGCCAAAGGC CCGCAACTGG AAGCGCAACA CAGCGCGGGC 
AT C AAAT TAG GCTACCGCTG GTAA 



This corresponds to the amino acid sequence <SEQ ID 650; ORFl-l>: 



1 MKTTDKRTTE THRKAPKTGR IRFSPAYLAI CLSFGIL PQA WAGHTYFGIN 

51 YQYYRDFAEN KGKFAVGAKD IEVYNKKGEL VGKSMTKAPM IDFSWSRNG 

101 VAALVGDQYI VSVAHNGGYN NVDFGAEGRN PDQHRFTYKI VKRNNYKAGT 

151 KGHPYGGDYH MPRLHKFVTD AEPVEMTSYM DGRKYIDQNN YPDRVRIGAG 

2 01 RQYWRSDEDE PNNRESSYHI ASAYSWLVGG NTFAQNGSGG GTVNLGSEKI 
251 KHSPYGFLPT GGSFGDSGSP MFIYDAQKQK WLINGVLQTG NPYIGKSNGF 

3 01 QLVRKDWFYD EIFAGDTHSV FYEPRQNGKY SFNDDNNGTG KINAKHEHNS 
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351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 



LPNRLKTRTV 
GKGELILTSN 
VNGVANDRLS 
FSEIGLVSGR 
DEGAMIVNHN 
TTKTNGRLNL 
YNHLNDHWSQ 
VKGDWHLSNH 
LTKTDISGNV 
SLVGNAQATF 
HSALNGNVSL 
GNLNLDNATI 
SVESRFNTLT 
NTGNEPASLE 
EFR1HNPVKE 
VAEPARQAGG 
ARRARRDLPQ 
RVFAEDRRNA 
GILFSHNRTE 
SLSDGIGGKI 
ENVNIATPGL 
TRVNTAVLAQ 
IKLGYRW* 



QLFNVSLSET 
INQGAGGLYF 
KIGKGTLHVQ 
GTVQLNADNQ 
QDKESTVTIT 
VYQPAAEDRT 
KEGIPRGEIV 
AQAVFGVAPH 
DLADHAHLNL 
NQATLNGNTS 
ADKAVFHFES 
TLNSAYRHDA 
VNGKLNGQGT 
QLTWEGKDN 
QELSDKLGKA 
ENVGIMQAEE 
LQPQPQPQPQ 
VWTSGIRDTK 
NTFDDGIGNS 
RRRVLHYGIQ 
AFNRYRAGIK 
DFGKTRSAEW 



ARE PV YHAAG 
QGDFTVSPEN 
AKGENQGSIS 
FNPDKLYFGF 
GNKDIATTGN 
LLLSGGTNLN 
WDNDWINRTF 
QSHTICTRSD 
TGLATLNGNL 
ASGNASFNLS 
SRFTGQISGG 
AGAQTGSATD 
FRFMSELFGY 
KPLSENLNFT 
EAKKQAEKDN 
EKKRVQADKD 
RDLISRYANS 
HYRSQDFRAY 
ARLAHGAVFG 
ARYRAGFGGF 
ADYSFKPAQH 
GVNAE IKGFT 



GVNSYRPRLN 
NETWQGAGVH 
VGDGTVILDQ 
RGGRLDLNGH 
NNSLDSKKEI 
GNITQTNGKL 
KAENFQIKGG 
WTGLTNCVEK 
SANGDTRYTV 
DHAVQNGSLT 
KDTALHLKDS 
APRRRSRRSR 
RSDKLKLAES 
LQNEHVDAGA 
AQSLDALIAA 
TALAKQREAE 
GLSEFSATLN 
RQQTDLRQIG 
QYGIDRFYIG 
GIEPHIGATR 
ISITPYLSLS 
LSLHAAAAKG 



NGENISFIDE 
ISEDSTVTWK 
QADDKGKKQA 
SLSFHRIQNT 
AYNGWFGEKD 
FFSGRPTPHA 
QAWSRNVAK 
TITDDKVIAS 
SHNATQNGNL 
LSGNAKANVS 
EWTLPSGTEL 
RSLLSVTPPT 
SEGTYTLAVN 
WRYQLIRKDG 
GRDAVEKTES 
TRPATTAFPR 
SVFAVQDELD 
MQKNLGSGRV 
ISAGAGFSSG 
YFVQKADYRY 
YTDAASGKVR 
PQLEAQHSAG 



Computer analysis of these sequences gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF1 shows 57.8% identity over a 1456aa overlap with an ORF (ORF la) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 1 . pep MKTTDKRTTETHRKAPKTG RIRFXAAYLAICLSFGIL PQAWAGHTYFGINYOYYRDFAEN 
I I I M I I I I I I I I I I II I I I I I I I I I I I I I I | | | | | I | | | | | | | | | | | | | | | | | | || | 

orf la MKTTDKRTTETHRKAPKTGR IRFSPAYLAICLSFGIL PQAWAGHTYFGINYOYYRDFAEN 



70 



80 



90 



100 



110 



120 



KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALVGVQYIVSVAHNGGYN 

I M I I I I II II I i M M I I I I I I I I f M II I I I M M II I I I I I I I I I I I I I I I I I M I 
KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALVGDQYIVSVAHNGGYN 

70 80 90 100 110 120 

130 140 150 160 170 180 

NVDFGAEGXNIXDQXRXTYKIVKRNNYKAGTKGHPYGGDYHMPRLHKXVTDAEPVEMTSY 

MINIUM II I : I : I II I I I I I :: I I |: I I I I I I I I I || I I I 

NVDFGAEGXN— PDQHRFSYQIVKRNNYKPDNS— HPYNGDXHMPRLHKFVTDAEPVEMTSD 

130 140 150 160 170 

190 200 210 
MDGRKYIDQNNYPDRVRIGAGRQYWRSDEDEP NN 

II I I : : : I I M I II I : I : : I I I I : I : | | 
MRGNTYSDKEKYPERVRIGSGHHYWRYDDDKHGDLSYSGAWLIGGNTHMQGWGNNGVXSL 

180 190 200 210 220 230 

220 230 240 250 260 

RESSYH IA SGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNGFQLVRK 

I ■• M M I I I M M : : I N : i I II M I M I : I M i I M I 

SGDVRHANDYGPMPIAGAAGDSGSPMFIYDKTNNKWLLNGVLQTGYPYSGRENGFQLIRK 
240 250 260 270 280 290 

270 280 290 300 310 320 

DWFYDEIFAGDTHSVFYEPRQNGKYSFNDDNNGTGKINAKHEHNSLPNRLKTRTVQLFNV 

U : I I I I : i : I I I : II : : I I : : : I I I I I : : : I : I | : | | : : | | : | | : 

DWFYDDIYRGDTHTVXFEPRSNGHFSFTSNNNGTGTVTETNEKVSNP-KLKVQTVRLFDE 
300 310 320 330 340 350 
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■rf l.pep 
irfla 



SLSETAREPVYHAAGGVN SYRPRLNNGENI S FI DEGKGELILT SN INQGAGGLYFQGDFT 
||:|| : I I |||ll|:IMlllllll:IMI I : I : I I I : : I I I I I I I I I I 1 : 1 I I I 
SLNETDKEPVY-AAGGVNQYRPRLNNGENLSFIDYGNGKLILSNNINQGAGGLYFEGDFT 
360 370 380 390 400 410 

390 400 410 420 430 

VS PENNE TWQGAGVH I SED STVTWKVNGVANDRL SKI GKGTL 

| M M M I I I I I I I II I I 1 I II M I I I I M I I I I I I I I I I I I 

VSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGTLHVQAKGENQGSISVGDGT 
420 430 440 450 460 470 



VILDQQADDKGKKQAFSEIGLXSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNGHSLSFH 
480 490 500 510 520 530 



orf l.pep 

20 

orf la RiQNTDEGAMIXXHNATTTSTVTITGNESITQPSGKNINRLNYSKEIAYNGWFGEKDTTK 
540 550 560 570 580 590 



orf 1 . pep 
orf la 



orf la IPQGEIWDNDWIXRTFKAENFHIQGGQAVISRNVAKVEGDXHLSNHAQAVFGVAPHQSH 
660 670 680 690 700 710 

35 

440 450 460 470 480 

orfl pep XXXXXDKVTASLTKTDISGNVDLADHAHLNLTGLATLNGNLSAN 

: M : I I I I I I I i II I I I I I : I I : I I I I M I 

orf la TICTRSDWTGLTNCVEXXITDDKVIASLTKTDXSGXVXLXXXXXXXLXGXAXLXGNLSAN 
40 720 730 740 750 760 770 

490 500 510 520 530 540 

orfl . pep GDTRYTVSHNATQNGNXSLVXNAQATFNQATLNGNTSASGNASFNLSDHAVQNGSLTLSG 
I | I I I I I I I I I I II I I III I I I I I I I I I I I I I I : I I I I I I I I I I :: I : I I I I I I M 
45 orf la GDTRYTVSHNATQNGNLSLVGNAQATFNQATLNGNXSXSGNASFNLSNNAAQNGSLTLSD 

780 790 800 810 820 830 

550 560 570 580 590 600 

orfl . pep NAKANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWTLPSGXELGNL 
50 I I I I I I I I I I I I I I I I II I I I I I I I I : M I I I I : I I : I I I II I I I I I I I I I I I : I I II I 

orf la NAKANVSHSALNGNVSLADKAVFHFENSRFTGQLSGSKXTALHLKDSEWTLPSGTELGNL 
840 850 860 870 880 890 

610 620 630 640 650 660 

55 orfl . pep NLDNATITLNSAYRHDAAGAQTGSATDAPRRRSRRSRRSLLXVTPPTSVESRFNTLTVNG 

I ! I I II ! II I I I I I I I i II I I I I :: I : I I I I I I I I II I I I I II II II I II II M I 

orf la N L DN AT I T LN S AYRHD AAGAQT GXV S DT PRRRS RR S LLSVTPPTSVESRFNTLTVNG 

900 910 920 930 940 950 

60 670 680 690 700 710 720 

orf l.pep KLNGQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPAS LEQLTVVEGKDNKPL 

I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II I I : I I : I I I II I I I I I I I I 
orf la KLNXQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPVSLDQLTWEGKDNKPL 
960 970 980 990 1000 1010 

65 

730 740 750 

orf l.pep SEN LN FT LQNEHVDAGAW 

I I I I I I I I I I I II I I I I I 

orf la SENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAEAKKQAEKDNAQS 
70 1020 1030 1040 1050 1060 1070 
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orfl.pep 



LDALIAAGRDAAEKTESVAEPARXAGGENVGIMQAEEEKKRVQADKDSALAKQREAETRP 
1080 1090 1100 1110 1120 1130 



XTTAFPRARXARRDLPQPQPQPQPQPQPQRDLXSRYANSGLSEFSATLNSVFAVQDELDR 

1140 1150 1160 1170 1180 1190 

770 780 790 800 810 820 

VFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFSHNRTEN 
| 1 | | | [ | | | I I I I II I I i I I I M I 1 I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I 
VFAEDRRNAVWTSXIRXTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFSHNRTEN 
1200 1210 1220 1230 1240 1250 

830 840 850 860 870 880 

TFDDGIGNSARLAHGAVFGQYGIDRFYIGISAGAGFSSGSLSDGIGXKXRRRVLHYGIQA 

: 1 M M I II I I I I II M I I I M I II llll:IIIMM MINI I I I I I I I I M I I 
XFDDGIGNSARLAHGAVFGQYGIGRFDIGISTGAGFSSGXLSDGIGGKIRRRVLHYGIQA 
1260 1270 1280 1290 1300 1310 

890 900 910 920 930 940 

RYRAGFGGFGIEPHIGATRYFVQKADYRYENVNIATPGLAFNRYRAGIKADYSFKPAQHI 

I | | | I I I 1 I 1 I I I : I I I I I I II I I II I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I 
RYRAGFGGFGIEPYIGATRYFVQKADYRYENVNIATPGLAFNRYRAGIKADYSFKPAQHX 

1320 1330 1340 1350 1360 1370 

950 960 970 980 990 1000 

SITPYLSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEIKGFTLSLHAAAAKGP 
MMI I I I I I I I I II I I I I I i I I I I I I I I I I I I I M I I II I I I II I I I II I I I II I I I 
SITPYXSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEIKGFTLSXHAAAAKGP 
1380 1390 1400 1410 1420 1430 



1010 1020 
orfl.pep QLEAQHSAGIKLGYRWX 
I I i I I II I I I I I I I I M 
orfla QLEAQHSAGIKLGYRWX 
1440 1450 

The complete length ORF la nucleotide sequence <SEQ ID 65 1> is: 

1 ATGAAAACAA CCGACAAACG GACAACCGAA ACACACCGCA AAGCCCCGAA 

51 AACCGGCCGC ATCCGCTTCT CGCCTGCTTA CTTAGCCATA TGCCTGTCGT 

101 TCGGCATTCT TCCCCAAGCT TGGGCGGGAC ACACTTATTT CGGCATCAAC 

151 TACCAATACT ATCGCGACTT TGCCGAAAAT AAAGGCAAGT TTGCAGTCGG 

2 01 GGCGAAAGAT ATTGAGGTNT ACAACAAAAA AGGGGAGTTG GTCGGCAAAT 

251 CAATGACAAA AGCCCCGATG ATTGATTTTT CTGTGGTGTC GCGTAACGGC 

301 GTGGCGGCAT TGGTGGGCGA TCAATATATT GTGAGCGTGG CACATAACGG 

351 CGGCTATAAC AACGTTGATT TTGGTGCGGA AGGAAGNAAT CCCGATCAGC 

4 01 ACCGTTTTTC TTACCAAATT GTGAAAAGAA ATAATTATAA GCCTGACAAT 

4 51 TCACACCCTT ACAACGGCGA TTANCATATG CCGCGTTTGC ATAAATTTGT 

501 CACAGATGCA GAACCTGTCG AAATGACGAG TGACATGAGG GGGAATACCT 

551 ATTCCGATAA AGAAAAATAT CCCGAGCGTG TCCGCATCGG CTCAGGACAC 

601 CACTATTGGC GTTATGATGA TGACAAACAC GGCGATTTAT CCTACTCCGG 

651 CGCATGGTTA ATTGGCGGCA ATACACATAT GCAGGGTTGG GGAAATAATG 

7 01 GCGTANTTAG TTTGAGCGGC GATGTGCGCC ATGCCAACGA CTATGGCCCT 

7 51 ATGCCGATTG CAGGTGCGGC AGGCGACAGC GGTTCGCCAA TGTTTATTTA 

8 01 TGACAAAACA AACAATAAAT GGCTGCTCAA CGGAGTTTTA CAAACCGGCT 
851 ACCCTTATTC CGGCAGGGAA AACGGTTTCC AGCTGATACG CAAAGATTGG 
901 TTCTACGATG ACATTTACAG AGGCGATACA CATACCGTCT NTTTTGAACC 
951 GCGCAGTAAC GGACATTTTT CCTTTACATC CAACAACAAC GGTACGGGTA 

1001 CGGTAACAGA AACCAACGAA AAGGTNTCCA ATCCAAAGCT TAAAGTACAG 

1051 ACAGTCCGAC TGTTTGACGA ATCTTTGAAT GAAACTGATA AAGAACCAGT 

1101 TTACGCGGCA GGGGGTGTTA AT C AG T AC C G TCCAAGGTTA AACAACGGTG 

1151 AAAACCTTTC TTTTATCGAT TACGGCAACG GCAAACTCAT CTTATCAAAC 

12 01 AACATCAACC AAGGCGCGGG CGGTTTGTAT TTTGAAGGTG ATTTTACGGT 
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1251 CTCGCCTGAA AACAACGAAA CGTGGCAAGG CGCGGGCGTT CATATCAGTG 

1301 AAGACAGTAC CGTTACTTGG AAAGTAAACG GCGTGGCAAA CGACCGCCTG 

1351 TCCAAAATCG GCAAAGGCAC GCTGCACGTT CAAGCCAAAG GGGAAAACCA 

14 01 AGGCTCGATC AGCGTGGGCG ACGGTACAGT CATTTTGGAT CAGCAGGCAG 

1451 AC G AT AAAGG CAAAAAACAA GCCTTTAGTG AAATCGGCTT GNTCAGCGGC 

1501 AGGGGTACGG TGCAACTGAA TGCCGATAAT CAGTTCAACC CCGACAAACT 

1551 CTATTTCGGC TTTCGCGGCG GACGTTTGGA TTTAAACGGG CATTCGCTTT 

1601 CGTTCCACCG TATTCAAAAT ACCGAT G AAG GGGCGATGAT TGNCNATCAT 

1651 AATGCCACAA CAACATCCAC CGTTACCATT ACAGGGAATG AAAGTATTAC 

17 01 ACAACCGAGT GGTAAGAATA TCAATAGACT TAATTACAGC AAAGAAATTG 

17 51 CCTACAACGG TTGGTTTGGC GAGAAAGATA CGAC CAAAAC GAACGGGCGG 

1801 CTCAACCTTG TTTACCAGCC CGCCGCAGAA GACCGCACCC NGCTGCTTTC 

1851 CGGCGGAACA AATTTAAACG GCAACATCAC GCAAACAAAC GGCAAACTGT 

1901 TTTTCAGCGG CAGACCGACA CCGCACGCCT ACAATCATTT AGGAAGCGGG 

1951 TGGTCAAAAA TGGAAGGTAT CCCACAAGGA GAAATCGTGT GGGACAACGA 

2 001 CTGGATCNAC CGCACGTTTA AAGCGGAAAA TTTCCATATT CAGGGCGGGC 

2 051 AGGCGGTGAT TTCCCGCAAT GTTGCCAAAG TGGAAGGCGA TTGNCATTTG 

2101 AGCAATCACG CCCAAGCAGT TTTTGGTGTC GCACCGCATC AAAGCCATAC 

2151 AATCTGTACA CGTTCGGACT GGACNGGTCT GACAAATTGT GTCGAANAAA 

2201 NCATTACCGA CGATAAAGTG ATTGCTTCAT TGACTAAGAC NGACNTNAGC 

2251 GGCANTGTNA GNCTNNCCNA TNACGNTNNT TNAAANCTCN CNGGGCNTGC 

2301 NNCACTNAAN GGCAATCTTA GTGCAAATGG CGATACACGT TATACAGTCA 

2351 GCCACAACGC CACCCAAAAC GGCAACCTTA GCCTCGTGGG CAATGCCCAA 

2401 GCAACATTTA ATCAAGCCAC ATTAAACGGC AACNCATCGG NTTCGGGCAA 

2 451 TGCTTCATTT AATCTAAGCA ACAACGCCGC ACAAAACGGC AGTCTGACGC 

2501 TTTCCGACAA CGCTAAGGCA AACGTAAGCC ATTCCGCACT CAACGGCAAT 

2551 GTCTCCCTAG CCGATAAGGC AGTATTCCAT TTTGAAAACA GCCGCTTTAC 

2 601 CGGACAACTC AGCGGCAGCA AGGANACAGC ATTACACTTA AAAGACAGCG 

2 651 AATGGACGCT GCCGTCAGGC ACGGAATTAG GCAATTTAAA CCTTGACAAC 

2701 GCCACCATTA CACTCAATTC CGCCTATCGC CACGATGCTG CAGGCGCGCA 

2751 AACCGGCAGN GTGTCAGACA CGCCGCGCCG CCGTTCGCGC CGTTCCCTAT 

2801 TATCCGTTAC ACCGCCAACT TCGGTAGAAT CCCGTTTCAA CACGCTGACG 

2851 GTAAACGGCA AATTGAACNG TCAAGGAACA TTCCGCTTTA TGTCGGAACT 

2 901 CTTCGGCTAC CGAAGCGACA AATTGAAGCT GGCGGAAAGT TCCGAAGGNA 

2 951 CTTACACCTT GGCGGTCAAC AATACCGGCA ACGAACCCGT AAGCCTCGAT 

3001 CAATTGACGG TAGTGGAAGG GAAAGACAAC AAACCGCTGT CCGAAAACCT 

3051 TAATTTCACC CTGCAAAACG AACACGTCGA TGCCGGCGCG TGGCGTTACC 

3101 AACTCATCCG CAAAGACGGC GAGTTCCGCC TGCATAATCC GGTCAAAGAA 

3151 CAAGAGCTTT CCGACAAACT CGGCAAGGCA GAAGCCAAAA AACAGGCGGA 

3201 AAAAGACAAC GCGCAAAGCC TTGACGCGCT GATTGCGGCC GGGCGCGATG 

3251 CCGCCGAAAA GACAGAAAGC GTTGCCGAAC CGGCCCGGCN GGCAGGCGGG 

3301 GAAAATGTCG GCATTATGCA GGCGGAGGAA GAGAAAAAAC GGGTGCAGGC 

3351 GGATAAAGAC AGCGCNTTGG CGAAACAGCG CGAAGCGGAA ACCCGGCCGG 

34 01 NTACCACCGC CTTCCCCCGC GCCCGCNGCG CCCGCCGGGA TTTGCCGCAA 

3451 CCGCAGCCCC AACCGCAACC TCAACCCCAA CCGCAGCGCG ACCTGATNAG 

3501 CCGTTATGCC AATAGCGGTT TGAGTGAATT TTCCGCCACG CTCAACAGCG 

3551 TTTTCGCCGT ACAGGACGAA TTGGACCGCG TGTTTGCCGA AGACCGCCGC 

3601 AACGCNGTTT GGACAAGCNG CATCCGGNAC ACCAAACACT ACCGTTCGCA 

3651 AGATTTCCGC GCCTACCGCC AACAAACCGA CCTGCGCCAA ATCGGTATGC 

37 01 AGAAAAACCT CGGCAGCGGG CGCGTCGGCA TCCTGTTTTC GCACAACCGG 

37 51 ACCGAAAACA NCTTCGACGA CGGCATCGGC AACTCGGCAC GGCTTGCCCA 

3801 CGGCGCCGTT TTCGGGCAAT ACGGCATCGG CAGGTTCGAC ATCGGCATCA 

3851 GCACGGGCGC GGGTTTTAGC AGCGGCANTC TNTCAGACGG CATCGGAGGC 

3901 AAAATCCGCC GCCGCGTGCT GCATTACGGC ATTCAGGCAC GATACCGCGC 

3951 CGGTTTCGGC GGATTCGGCA T CGAACCGT A CATCGGCGCA ACGCGCTATT 

4 001 TCGTCCAAAA AGCGGATTAC CGCTACGAAA ACGTCAATAT CGCCACCCCC 

4 051 GGTCTTGCGT TCAACCGNTA CCGNGCGGGC ATTAAGGCAG AT T ATT CATT 

4101 CAAACCGGCG CAACACATNT CCATCACNCC TTATTTNAGC CTGTCCTATA 

4151 CCGATGCCGC TTCGGGCAAA GTCCGAACAC GCGTCAATAC CGCNGTATTG 

42 01 GCTCAGGATT TCGGCAAAAC CCGCAGTGCG GAATGGGGCG TAAACGCCGA 

4251 AATCAAAGGT TTCACGCTGT CCNTCCACGC TGCCGCCGCC AAAGGNCCGC 

4 3 01 AACTGGAAGC GCAACACAGC GCGGGCATCA AATTAGGCTA CCGCTGGTAA 

This encodes a protein having amino acid sequence <SEQ ID 652>: 

1 MKTTDKRTTE THRKAPKTGR IRFSPAYLAI CLSFGIL PQA WAGHTYFGIN 

51 YQYYRDFAEN KGKFAVGAKD IEVYNKKGEL VGKSMTKAPM IDFSVVSRNG 

101 VAALVGDQYI VSVAHNGGYN NVDFGAEGXN PDQHRFSYQI VKRNNYKPDN 

151 SHPYNGDXHM PRLHKFVTDA EPVEMTSDMR GNTYSDKEKY PERVRIGSGH 

201 HYWRYDDDKH GDLSYSGAWL IGGNTHMQGW GNNGVXSLSG DVRHANDYGP 



CHIR-0160 (356.001) 



-384- 



PATENT 



251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 



MPIAGAAGDS 

FYDDIYRGDT 
TVRLFDESLN 
NINQGAGGLY 
SKIGKGTLHV 
RGTVQLNADN 
NATTTSTVTI 
LNLVYQPAAE 
WSKMEGIPQG 
SNHAQAVFGV 
GXVXLXXXXX 
ATFNQATLNG 
VSLADKAVFH 
ATITLNSAYR 
VNGKLNXQGT 
QLTWEGKDN 
QELSDKLGKA 
ENVGIMQAEE 
PQPQPQPQPQ 
NAVWTSXIRX 
TENXFDDGIG 
KIRRRVLHYG 
GLAFNRYRAG 
AQDFGKTRSA 



GSPMFIYDKT 

HTVXFEPRSN 
ETDKEPVYAA 
FEGDFTVSPE 
QAKGENQGSI 
QFNPDKLYFG 
TGNESITQPS 
DRTXLLSGGT 
EIVWDNDWIX 
APHQSHTICT 
XXLXGXAXLX 
NXSXSGNASF 
FENSRFTGQL 
HDAAGAQTGX 
FRFMSELFGY 
KPLSENLNFT 
EAKKQAEKDN 
EKKRVQADKD 
PQRDLXSRYA 
TKHYRSQDFR 
NSARLAHGAV 
IQARYRAGFG 
IKADYSFKPA 
EWGVNAEIKG 



NNKWLLNGVL 
GHFSFTSNNN 
GGVNQYRPRL 
NNETWQGAGV 
SVGDGTVILD 
FRGGRLDLNG 
GKNINRLNYS 
NLNGNITQTN 
RT FKAEN FH I 
RSDWTGLTNC 
GNLSANGDTR 
NLSNNAAQNG 
SGSKXTALHL 
VSDTPRRRSR 
RSDKLKLAES 
LQNEHVDAGA 
AQSLDALIAA 
SALAKQREAE 
NSGLSEFSAT 
AYRQQTDLRQ 
FGQYGIGRFD 
GFGIEPYIGA 
QHXSITPYXS 
FTLSXHAAAA 



QTGYPYSGRE 
GTGTVTETNE 
NNGENLSFID 
HISEDSTVTW 
QQADDKGKKQ 
HSLSFHRIQN 
KEIAYNGWFG 
GKLFFSGRPT 
QGGQAVISRN 
VEXXITDDKV 
YTVSHNATQN 
SLTLSDNAKA 
KDSEWTLPSG 
RSLLSVTPPT 
SEGTYTLAVN 
WRYQLIRKDG 
GRDAAEKTES 
TRPXTTAFPR 
LNSVFAVQDE 
IGMQKNLGSG 
IGISTGAGFS 
TRYFVQKADY 
LSYTDAASGK 
KG PQLEAQHS 



NGFQLIRKDW 
KVSNPKLKVQ 
YGNGKLILSN 
KVNGVANDRL 
AFSEIGLXSG 
TDEGAMIXXH 
EKDTTKTNGR 
PHAYNHLGSG 
VAKVEGDXHL 
IASLTKTDXS 
GNLSLVGNAQ 
NVSHSALNGN 
TELGNLNLDN 
SVESRFNTLT 
NTGNEPVSLD 
EFRLHNPVKE 
VAEPARXAGG 
ARXARRDLPQ 
LDRVFAEDRR 
RVGILFSHNR 
SGXLSDGIGG 
RYENVNIATP 
VRTRVNTAVL 
AGIKLGYRW* 



25 A transmembrane region is underlined. 



ORF1-1 shows 86.3% identity over a 1462aa overlap with ORFla: 



MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFGILPQAWAGHTYFGINYQYYRDFAEN 
I I I I I I I I i I I I I I I I 1 I I I I M I I 1 I I I M [ I I I I II I I II I I I I I I! I I I I I I [ [ || I 
MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFGILPQAWAGHTYFGINYQYYRDFAEN 
10 20 30 40 50 60 

70 80 90 100 110 120 

KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALVGDQYIVSVAHNGGYN 

N M I I I I I I I I I I I I I I I I I I I I II I I I Mill 

KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALVGDQYIVSVAHNGGYN 

70 80 90 100 110 120 

130 140 150 160 170 179 

NVDFGAEGXNPDQHRFSYQIVKRNNYKPDNS-HPYNGDXHMPRLHKFVTDAEPVEMTSDM 

I I I I M N I I I I I I I : I : I I I I I I I I : : I I I : I I I I '1:111 | 

NVDFGAEGRNPDQHRFTYKIVKRNNYKAGTKGHPYGGDYHMPRLHKFVTDAEPVEMTSYM 

130 140 150 160 170 180 

180 190 200 210 220 230 
RGNTYSDKEKYPERVRIGSGHHYWRYDDDKHGDL — SYSGA WLIGGNTHMQGWGNN 

DGRKYIDQNNYPDRVRIGAGRQYWRSDEDEPNNRESSYHIASAYSWLVGGNTFAQNGSGG 
190 200 210 220 230 240 

240 250 260 270 280 290 

GVXSLSGD-VRHANDYGPMPIAGAAGDSGSPMFIYDKTNNKWLLNGVLQTGYPYSGRENG 

GTVNLGSEKIKHS-PYGFLPTGGSFGDSGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNG 
250 260 270 280 290 

300 310 320 330 340 350 

FQLIRKDWFYDDIYRGDTHTVXFE PRSNGHFSFTSNNNGTGTVTETNEKVSNP-KLKVQT 

I I I : I I I I I I i : I : I I I ] : I : I I I : I | : : | | : : : : : : | : | | . M . . , 

FQLVRKDWFYDEIFAGDTHSVFYEPRQNGKYSFNDDNNGTGKINAKHEHNSLPNRLKTRT 
300 310 320 330 340 350 

360 370 380 390 400 410 

VRLFDESLNETDKEPVY-AAGGVNQYRPRLNNGENLSFIDYGNGKLILSNNINQGAGGLY 
I : I I : I I : I I : I M I I II I I I : M I I I I II M : N I [ | : f : | | I :: I I I I I I I | | | 
VQLFNVSLSETAREPVYHAAGGVNSYRPRLNNGENISFIDEGKGELILTSNINQGAGGLY 
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360 370 380 390 400 4iu 

420 430 440 450 460 470 

FEGDFTVSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGTLHVQAKGEHQGSI 

|:||1I1IIII1IIMIIM1IMIIIIMM111IIIIIII1MMIIIMMM(NI 

FQGDFTVSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGTLHVQAKGENQGSI 
420 430 440 450 460 470 

480 490 500 510 520 530 

SVGDGTVILDQQADDKGKKQAFSEIGLXSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNG 

| | | | M | | I I I I I I I I 1 I I I I I I I M i I I I I I I I I! I I I 

SVGDGTVILDQQADDKGKKQAFSEIGLVSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNG 
480 490 500 510 520 530 

540 550 560 570 580 590 

HSLSFHRIQNTDEGAMIXXHNATTTSTVTITGNESITQPSGKNINRLNYSKEIAYNGWFG 

| i || | | | | | || I I I I I I II I I I 1 I I I I = = I : : I : I I I : : I I I ! I I I I I I 
HSLSFHRIQNTDEGAMIVNHNQDKESTVTITGNKDIAT-TGNN-NSLDSKKEIAYNGWFG 
540 550 560 570 580 590 

600 610 620 630 640 650 

EKDTTKTNGRLNLVYQPAAEDRTXLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLGSG 

| 1 M | | | M [ I I I I I I M 1 M II 1 I I I I I I 1 I I I I M M II I I M II II I I M II I : : 

EKDTTKTNGRLNLVYQPAAEDRTLLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLNDH 
600 610 620 630 640 650 

660 670 680 690 700 710 

WSKMEGIPQGEIVWDNDWIXRTFKAENFHIQGGQAVISRNVAKVEGDXHLSNHAQAVFGV 
||: I II I : II I I I I I M I I I M I I I I : I : 1 I I I I : I I I II I I : I I M M I I I I I M I 
WSQKEGIPRGEIVWDNDWINRTFKAENFQIKGGQAWSRNVAKVKGDWHLSNHAQAVFGV 
660 670 680 690 700 710 

720 730 740 750 760 770 

APHQSHTICTRSDWTGLTNCVEXXITDDKVIASLTKTDXSGXVXLXXXXXXXLXGXAXLX 
I | I I I I I I II II I I I ] I I I I I I : II I II I I I I I I I 1 I M I I : 1 = 1 

APHQSHTICTRSDWTGLTNCVEKTITDDKVIASLTKTDISGNVDLADHAHLNLTGLATLN 
720 730 740 750 760 770 

780 790 800 810 820 830 

GNLSANGDTRYTVSHNATQNGNLSLVGNAQATFNQATLNGNXSXSGNASFNLSNNAAQNG 

| | | | | | | II II II I I I II I I I I II II I I I II I I M I I I I I I = I I I I I I I I I I I : I I I 
GNLSANGDTRYTVSHNATQNGNLSLVGNAQATFNQATLNGNT SASGNASFNLSDHAVQNG 
780 790 800 810 820 830 

840 850 860 870 880 890 

SLTLSDNAKANVSHSALNGNVSLADKAVFHFENSRFTGQLSGSKXTALHLKDSEWTLPSG 

Mill I I I I I I I I I I I I I II I I I I I II I I I I : I II M I : II : I II II I I I I I I I I I I I 
SLTLSGNAKANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWTLPSG 
840 850 860 870 880 890 

900 910 920 930 940 
TELGNLNLDNATITLNSAYRHDAAGAQTGXVSDTPRRRSRRS LLSVTPPTSVESRFN 

I I I I I I I I I I I I I I I I M I I I I II I I II I :: I : I I I I I I I I I I I I M I I I I I I I I I 

TELGNLNLDNATITLNSAYRHDAAGAQTGSATDAPRRRSRRSRRS LLSVTPPTSVESRFN 
900 910 920 930 940 950 

950 960 970 980 990 1000 

TLTVNGKLNXQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPVSLDQLTWEG 
I I I I I I I I I i II I I I I I I I I I I I I M I I I 11 II II I I I I I I I II I I I I : I I = I I I I I I I 
TLTVNGKLNGQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPASLEQLTWEG 
960 970 980 990 1000 1010 

1010 1020 1030 1040 1050 1060 

KDNKP L S EN LN FT LQNEHVDAGAWRYQL IRKDGE FRLHNPVKEQE L S DKLGKAE AKKQAE 
I I I I I I I I I I I 11 I M I I I I I I I I I I I I I II I I I I I I I I I I I I I I M I I I I 1 I 1 I 1 I I I I 
KDNKPLSENLN FTLQNEHVDAGAWRYQL IRKDGE FRLHNPVKEQELSDKLGKAEAKKQAE 

1020 1030 1040 1050 1060 1070 



70 



orf la. pep 



1070 1080 1090 1100 1110 1120 

KDNAQSLDALIAAGRDAAEKTESVAEPARXAGGENVGIMQAEEEKKRVQADKDSALAKQR 
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I | | | | | | | | | I I I I II I : I I II I I I I I I I I I I I I I I I I I I I II M : I I I M I 

or fi_l KDNA Q S LDALIAAGRDAVEKTESVAEPARQAGGENVGIMQAEEEKKRVQADKDTALAKQR 
1080 1090 1100 1110 1120 1130 

1130 1140 1150 1160 1170 1180 

orfla oep EA ETRPXTTAFPRARXARRDLPQPQPQPQPQPQPQRDLXSRYANSGLSEFSATLNSVFAV 

I I I I M M ! I 1 I I I I I I I I I 1 I I I 1 I 1 I I I I I I I I I I I I I I I I I I I I I 

orfl-1 EAETRPATTAFPRARRARRDLPQLQPQPQPQP— QRDLISRYANSGLSEFSATLNSVFAV 

1140 1150 1160 1170 1180 1190 

1190 1200 1210 1220 1230 1240 

orfla pep QDELDRVFAEDRRNAVWTSXIRXTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFS 

I l I I I l I I I || I I I I I II I II I I I I I M I I I I I I I I I I I I I I I M I I I I I I ! I I I M I 

orfl-1 QDELDRVFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFS 
1200 1210 1220 1230 1240 1250 

1250 1260 1270 1280 1290 1300 

orfla pep HNRTENXFDDGIGNSARLAHGAVFGQYGIGRFDIGISTGAGFSSGXLSDGIGGKIRRRVL 
I | | | I I : I I I II I I I I I I I I I I I I I I I I 1 II 1111:1111111 I I I I I I I I I 1 I I I I 
orfl-1 HNRTENTFDDGIGNSARLAHGAVFGQYGIDRFYIGISAGAGFSSGSLSDGIGGKIRRRVL 
1260 1270 1280 1290 1300 1310 

1310 1320 1330 1340 1350 1360 

orfla pep HYGIQARYRAGFGGFGIEPYIGATRYFVQKADYRYENVNIATPGLAFNRYRAGIKADYSF 
M I I I I I I I I M I I II I I I : I I I I M I I I I I M I I I I I I I I I I I I I I I I I I I I I I I M I I 
orfl-1 HYGI QARYRAGFGGFG IEPHI GATRYFVQKADYRYENVN I AT PGLAFNRYRAG I KAD Y S F 

1320 1330 1340 1350 1360 1370 

1370 1380 1390 1400 1410 1420 

orfla . pep KPAQHXSITPYXSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEIKGFTLSXHA 

I I I I I I I I II I I I I I I M I I II I I I I I I I I II I I I II I I I I I I I I I I I I I I I I I I II 

orfl-1 KPAQHISITPYLSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEIKGFTLSLHA 

1380 1390 1400 1410 1420 1430 

1430 1440 1450 

orfla. pep AAAKGPQLEAQHSAGIKLGYRWX 

II i I I I I I I I I I I I I II I I II I I 
orfl-1 AAAKGPQLEAQHSAGIKLGYRWX 

1440 1450 

Homology with adhesion and penetration protein hap precursor of H.influenzae (accession number P45387) 
Amino acids 23-423 of ORF1 show 59% aa identity with hap protein in 450aa overlap: 

FXAAYLAICLSFGILPQAWAGHTYFGINYQYYRDFAENKGKFAVGAKDIEVYNKKGELVG 82 
F +L C+S GI QAWAGHTYFGI+YQYYRDFAENKGKF VGAK+IEVYNK+G+LVG 
FRLNFLTACVSLGIASQAWAGHTYFGIDYQYYRDFAENKGKFTVGAKNIEVYNKEGQLVG 65 

KSMTKAPMIDFSWSRNGVAALVGVQYIVSVAHNGGYNNVDFGAEGXNIXDQXRXTYKIV 142 

SMTKAPMIDFSVVSRNGVAALVG QYIVSVAHNGGYN+VDFGAEG N DQ R TY+IV 
TSMTKAPMIDFSWSRNGVAALVGDQYIVSVAHNGGYNDVDFGAEGRN-PDQHRFTYQIV 124 

KRNNYKAGTKGHPYGGDYHMPRLHKXVTDAEPVEMTSYMDGRKYIDQNNYPDRVRIGAGR 202 
KRNNY+A + HPY GDYHMPRLHK VT+AEPV MT+ MDG+ Y D+ NYP+RVRIG+GR 



QYWR+D+DE N SSY+++ 





orfl 


23 


45 


hap 
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orfl 


83 




hap 


66 


50 








orfl 


143 
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125 
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orfl 


203 
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orfl 


223 


60 










245 




orfl 


278 


65 


hap 


305 
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3 3 5 AGGVNS YRPRLNNGENI S FI DEGKGELILT SN INQGAGGLY FQGD FTV- S PENNETWQGA 3 9 3 
A G N Y+PR+ G+NI D+GKG L + +NINQGAGGLYF+G+F V +NN TWQGA 



hap 3 64 AAGYNIYQPRMEYGKNIYLGDQGKGTLTIENNINQGAGGLYFEGNFWKGKQNNITWQGA 



423 



5 orfl 394 GVHISEDSTVTWKVNGVANDRLSKIGKGTL 423 

GV I +D+TV WKV+ NDRLSKTG GTL 
hap 424 GV S I G Q DAT VE WKVHN PEN DRL SKIGIGTL 453 

Amino acids 715-101 1 of ORF1 show 50% aa identity with hap protein in 258aa overlap: 

Orfl 41 DTRYTVSHNATQ-NGNXSLVXNAQATFNQ-ATLNGNTSASGNASFNLSDHAVQNGSLTLS 98 
in DT+ S TQ NG+ +L NA + A LNGN + ++ F LS++A Q G++ LS 

hap 733 DTKVINSIPITQINGSINLTNNATVNIHGLAKLNGNVTLIDHSQFTLSNNATQTGNIKLS 792 

orfl 99 GNAKANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWTLPSGXELGN 158 
+A A V+++ LNGNV L D A F ++S F QI G KDT + L+++ WT+PS L N 
15 hap 7 93 NHANATVNNATLNGNVHLTDSAQFSLKNSHFWHQIQGDKDTTVTLENATWTMPSDTTLQN 852 

orfl 159 LNLDNATITLNSAYRHDAAGAQTGSATDAPXXXXXXXXXXLLXVTPPTSVESRFNTLTVN 218 

L L+N+T+TLNSAY + S+ +AP L T PTS E RFNTLTVN 

hap 853 LTLNNSTVTLKSAY S AS SNNAPRHRRS LETETTPTSAEHRFNTLTVN 8 99 

20 

orfl 219 GKLNGQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNE PASLEQLTWEGKDNKP 278 

GKL+GQGTF+F S LFGY+SDKLKL+ +EG YTL+V NTG EP +LEQLT++E DNKP 
hap 900 GKLSGQGTFQFTSSLFGYKSDKLKLSNDAEGDYTLSVRNTGKEPVTLEQLTLIESLDNKP 959 

25 orfl 27 9 LSENLNFTLQNEHVDAGA 2 96 

LS+ L FTL+N+HVDAGA 
hap 960 LSDKLKFT LENDHVDAGA 977 

Amino acids 1192-1450 of ORF1 show 41% aa identity with hap protein in 259aa overlap: 

Orfl 1 LDRVFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQTDLRQIGMQKNLGSGRVGILFSHNR 60 

30 LDR+F + ++AVWT+ +D + Y S FRAY+Q+T+LRQIG+QK L +GR+G +FSH+R 

hap 1135 LDRLFVDQAQSAVWTNIAQDKRRYDSDAFRAYQQKTNLRQIGVQKALANGRIGAVFSHSR 1194 

orfl 61 TENTFDDGIGNSARLAHGAVFGQYGIDRFYXXXXXXXXXXXXXXXXXIGXKXRRRVLHYG 120 
++NTFD+ +NAL+FQY KR+ ++YG 

35 hap 1195 SDNTFDEQVKNHATLTMMSGFAQYQWGDLQFGVNVGTGISASKMAEEQSRKIHRKA1NYG 1254 

orfl 121 IQARYRAGFGGFGIE PH IGATRYFVQKADYRYENVN I AT PGLAFNRYRAG IKADYS FKPA 18 0 

+ A Y+ G GI+P+ G RYF+++ +Y+ E V + TP LAFNRY AGI+ DY+F P 
hap 1255 VNASYQFRLGQLGIQPYFGVNRYFIERENYQSEEVRVKTPSLAFNRYNAGIRVDYTFTPT 1314 

40 

orfl 181 QHISITPYLSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEIKGFTLSLHAAAA 240 

+IS+ PY ++Y D ++ V+T VN VL Q FG+ E G+ AEI F +S + + 
hap 1315 DNISVKPYFFVNYVDVSNANVQTTVNLTVLQQPFGRYWQKEVGLKAEILHFQISAFISKS 1374 

45 orfl 241 KGPQLEAQHSAGIKLGYRW 259 

+G QL Q + G+KLGYRW 
hap 1375 QGSQLGKQQNVGVKLGYRW 1393 

Homology with a predicted ORF from N. gonorrhoeae 
50 The blocks of ORF1 show 83.5%, 88.3%, and 97.7% identities in 467, 298, and 259 aa overlap, 
respectively with a predicted ORF (ORFlng) from N. gonorrhoeae: 

orfl .pep MKTTDKRTTETHRKAPKTGRIRFXAAYLAICLSFGILPQAWAGHTYFGINYQYYRDFAEN 60 

I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I 
orflng MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFGILPQARAGHTYFGINYQYYRDFAEN 60 

55 

orfl -pep KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALVGVQYIVSVAHNGGYN 120 

I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I : ! I M I I I I I I I I I I 
orflng KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALAGDQYIVSVAHNGGYN 120 

60 orfl. pep NVD FGAE GXN IXDQXRXT YKI VKRNNYKAGTKGH P YGG D YHMPRLHKXVT DAE PVEMT S Y 180 

I I I I I I I I II I : I : I I II I I I I I I I : I I I I I I I I I I I I I I I M I I I I I I I I II 
orflng NVDFGAEGSN-PDQHRFSYQIVKRNNYKAGTNGHPYGGDYHMPRLHKFVTDAEPVEMTSY 17 9 
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orfl.pep MDGRKYIDQNNYPDRVRIGAGRQYWRSDEDEPNNRESSYHIAS 223 

111 II I I : I I I I I I M I I I I I I I I M I I I I I I I I I I I I I I 
orf lng MDGWKYADLNKYPDRVRIGAGRQYWRSDEDEPNNRESSYHIASAYSWLVGGNTFAQNGSG 23 9 

orf 1 . pep GSPMFIYDA QKQKWLINGVLQTGNPYIGKSNG 255 

I I I I I I II I I I I I II I II I I I I I I I I I I I I I I 
orf lng GGTVNLGSEKIKHSPY GFLPTGGSFGDSGSPMFIYDA QKQKWLIN GVLQTGNPYIGKSNG 28 9 

10 orfl.pep FQLVRKDWFYDEIFAGDTHSVFYEPRQNGKYSFNDDNNGTGKINAKHEHNSLPNRLKTRT 315 

I I I I I I I I I I II I I I I I I I I I II II : I I I I I I I I : I I I : II I : II I : I Ml I I I I I I 
orf lng FQLVRKDW FY DEI FAGDTHS VFYE PHQNGKYFFN DNNNGAGKI DAKHKHYS L PYRLKTRT 359 

orfl.pep VQL FNVSL S E TARE PVYHAAGGVN S YRPRLNNGEN I S F I DEGKGEL I LT SN INQGAGGL Y 37 5 

15 I I I I I I I I I I I I I I I I I I I I I j I I I I I II II I I I I I M M : I I I II I I I | M M I | | | | | 

orf lng VQLFNVSLSETAREPVYHAAGGVNSYRPRLNNGENISFIDKGKGELILTSNINQGAGGLY 

orfl.pep FQGDFTVSPENNETWQGAGVHISEDSTVTWKWGVANDRLSKIGKGT 422 

I : I : I I I II : II M I I I N I [ I I : I I M M I I II I II I I M I I I M 

20 orf lng FEGNFTVSPKNNETWQGAGVHISDGSTVTWKWGVANDRLSKIGKGTLLVQAKGENQGSV 47 9 

// 

orfl.pep DKVT A SLTKTDISGNVD LADHAHLN LT G LA 744 

: I I I : 111:111111 

orf lng FGVAPHQSHTICTRSDWTGLTSCTEKTITDDKVIASLSKTDVRGNVSLADHAHLNLTGLA 7 7 4 

25 

orf 1 .pep TLNGNLSANGDTR-YTVSHNATQNGNXSLVXNAQATFNQATLNGNTSASGNASFNLSDHA 8 03 

I : M II : : : : I I : I I I I I I I III I I II I I I I I I I I I I I I I I I I I I 1 I I :: I 

orf lng TFNGNL-VQAETRTIRLRANATQNGNLSLVGNAQATFNQATLNGNTSASDNASFNLSNNA 833 

30 orfl.pep VQNGSLTLSGNAKANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWT 8 63 

I I I I I I I I I I I I I II I I I I I M I I I I II I I I I I II : II I I I : I I I I I I I I I | I I M | | | 
orf lng VQNGSLTLSDNAKANVSHSALNGNVSLADKAVFHFENSRFTGKISGGKDTALHLKDSEWT 8 93 

orf 1 . pep LPSGXELGNLNLDNATITLNSAYRHDAAGAQTGSATDAPRRRSRRSRRSLLXVTPPTSVE 92 3 

35 I M I : I M M M M M II I I I I M I I I I II II I I I : I M I I I II II II I I II I I : I 

orf lng LPSGTELGNLNLDNATITLNSAYRHDAAGAQTGSAADAPRRRSRRS LLSVTPPTSAE 950 

orf 1 .pep SRFNTLTVNGKLNGQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPASLEQLT 983 

! I M I I II I I I I I I I I I II I I I I II I I I I II I I I I I I II I I I I I I I I I I I II : i I M I I 
40 orflng SRFNTLTVNGKLNGQGTFRFMSELFGYRSGKLKLAESSEGTYTLAVNNTGNEPVSLEQLT 1010 

orfl.pep WEGKDNKPLSENLNFTLQNEHVDAGAW 1011 

I I I I I I I I I I I II I I I I I I I I I I I I I I 
orflng VVEGKDNTPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAGET 1070 

45 // 

orfl.pep LDRVFAEDRRNAVWTSGIRDTKHYRSQDFR 1211 

I I I I I I I I I II I II I I I I I I I I I I I I I I II 
orflng PQRDLISRYANSGLSEFSATLNSVFAVQDELDRVFAEDRRNAVWTSGIRDTKHYRSQDFR 1239 

50 orfl.pep AYRQQTDLRQIGMQKNLGSGRVGILFSHNRTENTFDDGIGNSARLAHGAVFGQYGIDRFY 1271 

I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 1 I I I I I I I I I I I I I I I I I I I I I || 
orflng AYRQQTDLRQIGMQKNLGSGRVGILFSHNRTGNTFDDGIGNSARLAHGAVFGQYGIGRFD 12 99 

orfl.pep IGISAGAGFSSGSLSDGIGXKXRRRVLHYGIQARYRAGFGGFGIEPHIGATRYFVQKADY 1331 

55 I I I I I I I I I I I I M I M I I I I II II I II I I I I I I I I I I I M M I I I I II M M I I M 

orflng IGISAGAGFSSGSLSDGIRGKIRRRVLHYGIQARYRAGFGGFGIEPHIGATRYFVQKADY 1359 

orf 1 .pep RYENVNIATPGLAFNRYRAGIKADYSFKPAQHISITPYLSLSYTDAASGKVRTRVNTAVL 1391 

<n I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I II I I I I I I I I I I I 

t>U orflng RYENVNIATPGLAFNRYRAGIKADYSFKPAQHISITPYLSLSYTDAASGKVRTRVNTAVL 1419 

orfl.pep AQDFGKTRSAEWGVNAEIKGFTLSLHAAAAKGPQLEAQHSAGIKLGYRW 1440 

I I I I I I I I I I I II I I I II I I I I I I I I I I I 

orflng AQDFGKTRSAEWGVNAEIKGFTLSLHAAAAKGPQLEAQHSAGIKLGYRW 14 68 

65 The complete length ORFlng nucleotide sequence was identified <SEQ ID 653>: 

1 AT GAAAACAA CCGACAAACG GACAACCGAA ACACACCGCA AAGCCCCTAA 
51 AACCGGCCGC ATCCGCTTCT CGCCCGCTTA CTTAGCCATA TGCCTGTCGT 
101 TCGGCATTCT GCCCCAAGCC CGGGCGGGAC ACACTTATTT CGGCATCAAC 
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151 TACCAATACT ATCGCGACTT TGCCGAAAAT AAAGGCAAGT TTGCAGTCGG 

2 01 GGCGAAAGAT ATTGAGGTTT ACAACAAAAA AGGGGAGTTG GTCGGCAAAT 

2 51 CGATGACGAA AGCCCCGATG ATTGATTTTT CTGTGGTATC GCGTAACGGC 

3 01 GTGGCGGCAT TGGCGGGCGA TCAATATATT GTGAGCGTGG CACATAACGG 
351 CGGCTATAAC AATGTTGATT TTGGTGCGGA GGGAAGCAAT CCCGATCAGC 
401 ACCGCTTTTC TTACCAAATT GTGAAAAGAA ATAATTATAA AGCAGGGACT 
451 AACGGCCATC CTTATGGCGG CGATT AT CAT ATGCCGCGTT TGCACAAATT 
501 TGTCACAGAT GCAGAACCTG TTGAGATGAC CAGTTATATG GATGGGTGGA 
551 AATACGCTGA TTTAAATAAA TACCCTGATC GTGTTCGAAT CGGAGCAGGC 
601 AGACAATATT GGCGGTCTGA TGAAGACGAA CCCAATAACC GCGAAAGTTC 
651 AT AT CAT ATT GCAAGCGCAT ATTCTTGGCT CGTCGGTGGC AATACCTTTG 
701 CACAAAATGG ATCAGGTGGT GGCACAGTCA ACTTAGGTAG CGAAAAAATT 
751 AAACATAGCC CATATGGTTT TTTACCAACA GGAGGCTCAT TTGGCGACAG 
801 TGGCTCACCA ATGTTTATCT ATGATGCCCA AAAGCAAAAG TGGTTAATTA 
851 ATGGGGTATT GCAAACAGGC AACCCCTATA TAGGAAAAAG CAATGGCTTC 
901 CAGCTAGTTC GTAAAGATTG GTTCTATGAT GAAATCTTTG CTGGAGATAC 
951 CCATTCAGTA TTCTACGAAC C AC AT CAAAA TGGGAAATAC TTTTTTAACG 

1001 ACAATAATAA TGGCGCAGGA AAAATCGATG CCAAACATAA ACACTATTCT 

1051 C T AC C T TATA GATTAAAAAC ACGAACCGTT CAATTGTTTA ATGTTTCTTT 

1101 ATCCGAGACA GCAAGAGAAC CTGTTTATCA TGCTGCAGGT GGGGTCAACA 

1151 GTTATCGACC CAGACTGAAT AATGGAGAAA ATATTTCCTT TATTGACAAA 

12 01 GGAAAAGGTG AAT T GAT ACT TACCAGCAAC AT CAACCAAG GCGCGGGCGG 

1251 TTTGTATTTT GAGGGTAATT TTACGGTCTC GCCTAAAAAC AACGAAACGT 

1301 GGCAAGGCGC GGGCGTTCAT ATCAGTGATG GCAGTACCGT TACTTGGAAA 

1351 GTAAACGGCG TGGCAAACGA CCGCCTGTCC AAAAT CGGCA AAGGCACGCT 

1401 GCTGGTTCAA GCCAAAGGGG AAAACCAAGG CTCGGTCAGC GTGGGCGACG 

1451 GTAAAGTCAT CTTAGATCAG CAGGCGGACG ATCAAGGCAA AAAACAAGCC 

1501 TTTAGTGAAA TCGGCTTGGT CAGCGGCAGG GGGACGGTGC AACTGAATGC 

1551 C GAT AAT C AG TTCAACCCCG ACAAACTCTA TTTCGGCTTT CGCGGCGGAC 

1601 GTTTGGATTT GAACGGGCAT TCGCTTTCGT TCCACCGCAT TCAAAATACC 

1651 GATGAAGGGG CGATGATTGT CAACCACAAT CAAGACAAAG AATCCACCGT 

17 01 TACCATTACA GGCAATAAAG ATATTACTAC AACCGGCAAT AACAACAACT 

1751 TGGATAGCAA AAAAGAAATT GCCTACAACG GTTGGTTTGG CGAGAAAGAT 

1801 GCAACCAAAA CGAACGGGCG GCTCAATCTG AAT T AC C AAC CGGAAGAAGC 

1851 GGATCGCACT TTACTGCTTT CCGGCGGAAC AAATTTAAAC GGCAATATCA 

1901 CGCAAACAAA CGGCAAACTG TTTTTCAGCG GCAGACCGAC ACCGCACGCC 

1951 TACAATCATT TAGGAAGCGG GTGGTCAAAA ATGGAAGGTA TCCCACAAGG 

2001 AGAAATCGTG TGGGACAACG ATTGGATCGA CCGCACATTT AAAGCGGAAA 

2051 ACTTCCATAT TCAGGGCGGA CAAGCGGTGG TTTCCCGCAA TGTTGCCAAA 

2101 GTGGAAGGCG ATTGGCATTT AAGCAATCAC GCCCAAGCAG TTTTCGGTGT 

2151 CGCACCGCAT CAAAGCCACA CAATCTGTAC ACGTTCGGAC TGGACGGGTC 

2201 TGACAAGTTG TACCGAAAAA ACCATTACCG ACGATAAAGT GATTGCTTCA 

2251 TTGAGCAAGA CCGACAT CAG AGGCAATGTC AGCCTTGCCG AT CACGCT CA 

2301 TTTAAATCTC ACAGGACTTG CCACACTCAA CGGCAATCTT AGTGCAGGCG 

2351 GAGACACGCA CTATACGGTT ACGCGCAACG CCACCCAAAA CGGCAACCTC 

2401 AGCCTCGTGG GCAATGCCCA AGCAACATTT AATCAAGCCA CATTAAACGG 

2451 CAACACATCG GCTTCGGACA ATGCTTCATT TAATCTAAGC AACAACGCCG 

2501 TACAAAACGG CAGTCTGACG CTTTCCGACA ACGCTAAGGC AAACGTAAGC 

2551 CATTCCGCAC TCAACGGCAA TGTCTCCCTA GCCGATAAGG CAGTATTCCA 

2 601 TTTTGAAAAC AGCCGCTTTA CCGGAAAAAT CAGCGGCGGC AAGGATACGG 

2 651 CATTACACTT AAAAGACAGC GAATGGACGC TGCCGTCGGG CACGGAATTA 
27 01 GGCAATTTAA ACCTTGACAA CGCCACCATT ACACTCAATT CCGCCTATCG 

27 51 ACACGATGCG GCAGGCGCGC AAACCGGCAG TGCGGCAGAT GCGCCGCGCC 
2801 GCCGTTCGCG CCGTTCCCTA TTATCCGTTA CGCCGCCAAC TTCGGCAGAA 

28 51 TCCCGTTTCA ACACGCTGAC GGTAAACGGC AAATTGAACG GTCAGGGAAC 
2901 ATTCCGCTTT ATGTCGGAAC TCTTCGGCTA CCGCAGCGGC AAATTGAAGC 
2951 TGGCGGAAAG TTCCGAAGGC ACTTACACCT TGGCTGTCAA CAATACCGGC 
3001 AACGAACCCG TAAGTCTCGA GCAATTGACG GTAGTGGAAG GAAAAGACAA 
3051 CACACCGCTG TCCGAAAATC TTAATTTCAC CCTGCaaaAc gaacacgtcg 
3101 atgccggcgc atggCGTTAT CAGCTTATCC gcaaagacgG CGAGTTCCgc 
3151 CTGCATAATC CGGTCAAAGA ACAAGAGCTT TCCGACAAAC TCGGCAAGgc 
3201 gggagaaACA GAggccgccT TGACGGCAAA ACAGGCacaA CTTGCCGCCA 
3251 AAcaacaggc ggaaaAAGAC AACgcgcaaa gccttgAcgc gctgattgcg 
3301 gCcgggcgca atgccaccga AAAGGCAgaa agtgttgccg aaccgGCCCG 
3351 GCAGGCAGGC GGGGAAAAtg ccgGCATTAT GCAGGCGGAG GAAGAGAAAA 

3 4 01 AACGGGTGCA GGCGGATAAA GACACCGCCT TGGCGAAACA GCGCGAAGCG 
3 451 GAAACCCGGC CGGCTACCAC CGCCTTCCCC CGCGCCCGCC GCGCCCGCCG 
3501 GGATTTGCCG CAACCGCAGC CCCAACCGCA ACCCCAACCG CAGCGCGACC 
3551 TGATCAGCCG TTATGCCAAT AGCGGTTTGA GTGAATTTTC CGCCACGCTC 
3 601 AACAGCGTTT TCGCCGTACA GGACGAATTG GACCGCGTGT TTGCCGAAGA 
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3 651 CCGCCGCAAC GCCGTTTGGA CAAGCGGCAT CCGGGACACC AAACACTACC 
37 01 GTTCGCAAGA TTTCCGCGCC TACCGCCAAC AAACCGACCT GCGCCAAATC 
3751 GGTATGCAGA AAAACCTCGG CAGCGGGCGC GTCGGCATCC TGTTTTCGCA 
3801 CAACCGGACC GGAAACACCT TCGACGACGG CATCGGCAAC TCGGCACGGC 
3851 TTGCCCACGG TGCCGTTTTC GGGCAATACG GCATCGGCAG GTTCGACATC 
3901 GGCATCAGCG CGGGCGCGGG TTTTAGTAGC GGCAGCCTTT CAGACGGCAT 
3951 CAGAGGCAAA ATCCGCCGCC GCGTGCTGCA TTACGGCATT CAGGCAAGAT 

4 001 ACCGCGCAGG TTTCGGCGGA TTCGGCATCG AACCGCACAT CGGCGCAACG 
4051 CGCTATTTCG TCCAAAAAGC GGATTACCGA TACGAAAACG TCAATATCGC 
4101 CACCCCGGGC CTTGCATTCA ACCGCTACCG CGCGGGCATT AAGGCAGATT 
4151 ATTCATTCAA ACCGGCGCAA CACATTTCCA TCACGCCTTA TTTGAGCCTG 
4201 TCCTATACCG ATGCCGCTTC CGGCAAAGTC CGAACGCGCG TCAATACCGC 
4251 CGTATTGGCG CAGGATTTCG GCAAAACCCG CAGTGCGGAA TGGGGCGTAA 
4301 ACGCCGAAAT CAAAGGTTTC ACGCTGTCCC TCCACGCTGC CGCCGCCAAG 
4351 GGGCCGCAAT TGGAAGCGCA GCACAGCGCG GGCATCAAAT TAGGCTACCG 
4401 CTGGTAA 

This is predicted to encode a protein having amino acid sequence <SEQ ID 654>: 

1 MKTTDKRTTE THRKAPKTGR IRFSPAYLAI CLSFGILPQA RAGHTYFGIN 

51 YQYYRDFAEN KGKFAVGAKD IEVYNKKGEL VGKSMTKAPM IDFSWSRNG 

101 VAALAGDQYI VSVAHNGGYN NVDFGAEGSN PDQHRFSYQI VKRNNYKAGT 

151 NGHPYGGDYH MPRLHKFVTD AEPVEMTSYM DGWKYADLNK YPDRVRIGAG 

201 RQYWRSDEDE PNNRESSYHI ASAYSWLVGG NTFAQNGSGG GTVNLGSEKI 

251 KHSPY GFLPT GGSFGDSGSP MFIYDAQ KQK WLINGVLQTG NPYIGKSNGF 

301 QLVRKDWFYD E I FAGDTHS V FYEPHQNGKY FFNDNNNGAG KIDAKHKHYS 

351 LPYRLKTRTV QLFNVSL3ET ARE PVYHAAG GVNSYRPRLN NGENISFIDK 

401 GKGELILTSN INQGAGGLYF EGNFTVS PKN NETWQGAGVH ISDGSTVTWK 

451 VNGVANDRLS KIGKGTLLVQ AKGENQGSVS VGDGKVILDQ QADDQGKKQA 

501 FSEIGLVSGR GTVQLNADNQ FNPDKLYFGF RGGRLDLNGH SLSFHRIQNT 

551 DEGAMIVNHN QDKESTVTIT GNKDITTTGN NNNLDSKKEI AYNGWFGEKD 

601 ATKTNGGLNL NYPPEEADRT LLLSGGTNLN GNITQTNGKL FFSGRPTPHA 

651 YNHLGSGWSK MEGIPQGEIV WDNDWIDRTF KAENFHIQGG QAWSRNVAK 

701 VEGDWHLSNH AQAVFGVAPH QSHTICTRSD WTGLTSCTEK TITDDKVIAS 

7 51 LSKTDVRGNV SLADHAHLNL TGLATFNGNL VQAETRTIRL RANATQNGNL 
801 SLVGNAQATF NQATLNGNTS ASDNASFNLS NNAVQNGSLT LSDNAKANVS 

8 51 HSALNGNVSL ADKAVFHFEN SRFTGKISGG KDTALHLKDS EWTLPSGTEL 
901 GNLNLDNATI TLNSAYRHDA AGAQTGSAAD APRRRSRRSL LSVTPPTSAE 
951 SRFNTLTVNG KLNGQGTFRF MSELFGYRSG KLKLAESSEG TYTLAVNNTG 

1001 NEPVSLEQLT VVEGKDNTPL SENLNFTLQN EHVDAGAWRY QLIRKDGEFR 

1051 LHNPVKEQEL SDKLGKAGET EAALTAKQAQ LAAKQQAEKD NAQSLDALIA 

1101 AGRNATEKAE SVAEPARQAG GENAGIMQAE EEKKRVQADK DTALAKQREA 

1151 ETRPATTAFP RARRARRDLP QPQPQPQPQP QRDLISRYAN SGLSEFSATL 

1201 NSVFAVQDEL DRVFAEDRRN AVWTSGIRDT KHYRSQDFRA YRQQTDLRQI 

1251 GMQKNLGSGR VGILFSHNRT GNTFDDGIGN SARLAHGAVF GQYGIGRFDI 

1301 GISAGAGFSS GSLSDGIRGK IRRRVLHYGI QARYRAGFGG FGIEPHIGAT 

1351 RYFVQKADYR YENVNIATPG LAFNRYRAGI KADYSFKPAQ HISITPYLSL 

14 01 SYTDAASGKV RTRVNTAVLA QDFGKTRSAE WGVNAEIKGF TLSLHAAAAK 

1451 GPQLEAQHSA GIKLGYRW* 

Underlined and double-underlined sequences represent the active site of a serine protease (trypsin 
family) and an ATP/GTP-binding site motif A (P-loop). 



ORF1-1 and ORFlng show 93.7% identity in 1471 aa overlap: 



MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFGILPQAWAGHTYFGINYQYYRDFAEN 

N I I I I I I I I I I I I I I I I I I I I I I I I I I I I I | I I | | | | | M I I 

MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFGILPQARAGHTYFGINYQYYRDFAEN 
10 20 30 40 50 60 

70 80 90 100 110 120 

KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALVGDQYIVSVAHNGGYN 

I I I I I I I I I I I I I N I I I I I I I I I I : I I I I I I I I 

KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALAGDQY I VSVAHNGGYN 

7 0 80 90 100 110 120 
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130 140 150 160 170 180 

orfl-1 Deo NV DFGAEGRNPDQHRFTYKIVKRNNYKAGTKGHPYGGDYHMPRLHKFVTDAEPVEMTSYM 
orfl l.pep N |,,||||:,:|||||||||l|:||||||||||||||lllllllllllllll 

orflna-1 NVDFGAEGSNPDQHRFSYQIVKRNNYKAGTNGHPYGGDYHMPRLHKFVTDAEPVEMTSYM 
g 130 140 150 160 170 180 

190 200 210 220 230 240 

nT - f 1 -1 r, PD DGRKYIDQNNYPDRVRIGAGRQYWRSDEDEPNNRESSYHIASAYSWLVGGNTFAQNGSGG 
P P | | || | |: | || MM II II II I I MM I I! II II II UN II II II II II II II I Hi 

n-rf -lnd-l DGWKYADLNKYPDRVRIGAGRQYWRSDEDEPNNRESSYHIASAYSWLVGGNTFAQNGSGG 

g 190 200 210 220 230 240 

250 260 270 280 290 300 

orfl-1 pep QTVNLGSEKIKHSPYGFLPTGGSFGDSGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNGF 

| M I I I I I I M II II II I I I II I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I M 

orflna-1 QTVNLGSEKIKHSPYGFLPTGGSFGDSGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNGF 

250 260 270 280 290 300 

310 320 330 340 350 360 

orfl-1 pep QLVRKDWFYDEIFAGDTHSVFYE PRQNGKYSFNDDNNGTGKINAKHEHNSLPNRLKTRTV 

M || | | | | I I M I I I I I I I I I I I I : I I I I I I I I : I II : I II : I I I : I III I I I I I I I 
orflnq-l qlvrkdwfydeifagdthsvfyephqngkyffndnnngagkidakhkhyslpyrlktrtv 
310 320 330 340 350 360 

370 380 390 400 410 420 

orfl-1 pep qlfnvslsetarepvyhaaggvnsyrprlnngenisfidegkgeliltsninqgagglyf 

I I I I I I I I M I I I I I I I II I I I I II II II I I I I I I I I I I : I I I I I I I I I I I I M I I I II I 

orflng-1 qlfnvslsetarepvyhaaggvnsyrprlnngenisfidkgkgeliltsninqgagglyf 
370 380 390 400 410 420 

430 440 450 460 470 480 

orfl-1 pep QGDFTVSPENNETWQGAGVHISEDSTVTWKVNGVANDRLSKIGKGTLHVQAKGENQGSIS 
: | : | I I I I : I I I I I II I I I I I I : I II I I I I I I I I II M I II I II I I M M II II I I : I 
orflng-1 EGNFTVSPKNNETWQGAGVHISDGSTVTWKWGVANDRLSKIGKGTLLVQAKGENQGSVS 

430 440 450 460 470 480 

490 500 510 520 530 540 

orfl-1 pep VGDGTVILDQQADDKGKKQAFSEIGLVSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNGH 

I I I I I II II I I I I : II II 1 I I I I I I 1 I M II II II II II II I M I I I I I II I I I 

orflng-1 VGDGKVILDQQADDQGKKQAFSEIGLVSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNGH 

490 500 510 520 530 540 

550 560 570 580 590 600 

orfl-1 pep SLSFHRIQNTDEGAMIVNHNQDKESTVTITGNKDIATTGNNNSLDSKKEIAYNGWFGEKD 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I : M I II I M II I 1 I I I I I 
orflng-1 SLSFHRIQNTDEGAMIVNHNQDKESTVTITGNKDITTTGNNNNLDSKKE IAYNGWFGEKD 

550 560 570 580 590 600 

610 620 630 640 650 660 

orfl-1 . pep TTKTNGRLNLVYQPAAEDRTLLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLNDHWSQ 
: I I I I I I I I I III I I I I I I I I 1 I 1 I I I I I I I I I II I I II I I I I I I I I I M : : II: 
orflng-1 ATKTNGRLNLNYQPEEADRTLLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLGSGWSK 
610 620 630 640 650 660 

670 680 690 700 710 720 

orfl-1 . pep KEGIPRGEIVWDNDWINRTFKAENFQIKGGQAWSRNVAKVKGDWHLSNHAQAVFGVAPH 
I I I I : I I M I I I I I I : I I I I I I II : I : M II I I I I I I I I I : I I I I I I I I II II I I I I I I 

orflng-1 MEGIPQGEIWDNDWIDRTFKAENFHIQGGQAWSRNVAKVEGDWHLSNHAQAVFGVAPH 

670 680 690 700 710 720 

730 740 750 760 770 780 

orfl-1 . pep QSHTICTRSDWTGLTNCVEKTITDDKVIASLTKTDISGNVDLADHAHLNLTGLATLNGNL 

III I I I I 1 M : I : I I II I I I I I I I II : I I I I I I I : I I I I I I I I 

orflng-1 QSHTICTRSDWTGLTSCTEKTITDDKVIASLSKTDIRGNVSLADHAHLNLTGLATLNGNL 

730 740 750 760 770 780 

790 800 810 820 830 840 

orfl-1 . pep SANGDTRYTVSHNATQNGNLSLVGNAQATFNQATLNGNTSASGNASFNLSDHAVQNGSLT 
I I : I I I : I I I : : I 1 I I I I I I M I I I I I I I I I I I I I I I I I 1 I I I I I I I I I : : I I I I I II I 
orflng-1 SAGGDTHYTVTRNATQNGNLSLVGNAQATFNQATLNGNTSAS DNASFNLSNNAVQNGSLT 
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790 800 810 820 830 840 

850 860 870 880 890 900 

LSGNAKANVSHSALNGNVSLADKAVFHFESSRFTGQISGGKDTALHLKDSEWTLPSGTEL 

|| | | | | | | || | | | I I I I U [ I I I I I I I I : I I II I : I ! I I I I I H I I II I I I M 

LSDNAKANVSHSALNGNVSLADKAVFHFENSRFTGKISGGKDTALHLKDSEWTLPSGTEL 
850 860 870 880 890 900 

910 920 930 940 950 960 

GNLNLDNATITLNSAYRHDAAGAQTGSATDAPRRRSRRSRRSLLSVTPPTSVESRFNTLT 
| | | | | | | | | || I I I I I II I II II I I I I I : I I II I I II I I I I I I I ! I ! I : I I I I I I I I 
GNLNLDNATITLNSAYRHDAAGAQTGSAADAPRRRSR— RSLLSVTPPTSAESRFNTLT 

910 920 930 940 950 

970 980 990 1000 1010 1020 

VNGKLNGQGTFRFMSELFGYRSDKLKLAESSEGTYTLAVNNTGNEPAS LEQLTVVEGKDN 

I | | | | | | | I M I I I I II I I I I I I I I I I I M I I I 1 I 1 I I I I I I I I I : I I I I M II I I I I I 
VNGKLNGQGTFRFMSELFGYRSGKLKLAESSEGTYTLAVNNTGNEPVS LEQLTVVEGKDN 

960 970 980 990 1000 1010 

1030 1040 1050 1060 1070 

KPLSENLNFTLQNEHVDAGAWRYQLIRKDGE FRLHNPVKEQELSDKLGKA 

I | I I || II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M II II 
TPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAGETEAALTAK 
1020 1030 1040 1050 1060 1070 

1080 1090 1100 1110 1120 

EAKKQAEKDNAQSLDALIAAGRDAVEKTE SVAE PARQAGGENVGIMQAEEEKKRVQ 

I I : I M I I I I I I I I I I I I M I : I : I I : I I I I I I I I I I I I I I = M I I I I I I I I I I I 
QAQLAAKQQAEKDNAQSLDALIAAGRNATEKAESVAEPARQAGGENAGIMQAEEEKKRVQ 
1080 1090 1100 1110 1120 1130 

1130 1140 1150 1160 1170 1180 

ADKDTALAKQREAETRPATTAFPRARRARRDLPQLQPQPQPQPQRDLISRYANSGLSEFS 
I II II II II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I N I I I 
ADKDTALAKQREAETRPATTAFPRARRARRDLPQPQPQPQPQPQRDLISRYANSGLSEFS 
1140 1150 1160 1170 1180 1190 

1190 1200 1210 1220 1230 1240 

ATLNSVFAVQDELDRVFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQTDLRQIGMQKNLG 
I I I I I I I I I I I I I I I I I M I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I 
ATLNSVFAVQDELDRVFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQTDLRQIGMQKNLG 
1200 1210 1220 1230 1240 1250 

1250 1260 1270 1280 1290 1300 

SGRVGILFSHNRTENTFDDGIGNSARLAHGAVFGQYGIDRFYIGISAGAGFSSGSLSDGI 
I I I I I II II M I I I I I I I I I I II I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I 
SGRVGILFSHNRTGNT FDDGIGNSARLAHGAVFGQYGIGRFDIGISAGAGFSSGSLSDGI 
1260 1270 1280 1290 1300 1310 

1310 1320 1330 1340 1350 1360 

GGKIRRRVLHYGIQARYRAGFGGFGIEPHIGATRYFVQKADYRYENVNIATPGLAFNRYR 
I I I I I I I I I I M I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 
RGKIRRRVLHYGIQARYRAGFGGFGIEPHIGATRYFVQKADYRYENVNIATPGLAFNRYR 
1320 1330 1340 1350 1360 1370 

1370 1380 1390 1400 1410 1420 

AGIKADYSFKPAQHISITPYLSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEI 
I I I I I I I I ] II I I I I I I II I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I M 
AGIKADYSFKPAQHISITPYLSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEWGVNAEI 
1380 1390 1400 1410 1420 1430 

1430 1440 1450 

KGFTLSLHAAAAKGPQLEAQHSAGIKLGYRWX 

KGFTLSLHAAAAKGPQLEAQHSAGIKLGYRWX 
1440 1450 1460 



In addition, ORFlng shows 55.7% identity with hap protein (P45387) over a 1455aa overlap: 
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SCORES initl: 1104 Initn: 4632 Opt: 2680 

Smith-Waterman score: 5165; 55.7% identity in 1455 aa overlap 



orflng-1 .pep 



MKTTDKRTTETHRKAPKTGRIRFSPAYLAICLSFGILPQAFAGHTYFGINYQYYRDFAEN 
| : | : | : I : I I : i I I 1 I i I I I I : I I I I I M I I I 
„ „ al MKKTVFRLN FLT AC I S LG I VSQAWAGHT YFG I D YQYYRD FAEN 

p45387 10 20 30 40 

10 70 80 90 100 110 120 

orflnCT _! pep KGKFAVGAKDIEVYNKKGELVGKSMTKAPMIDFSWSRNGVAALAGDQYIVSVAHNGGYN 

9 ||M:M1::|:MII:|:IM I I! I I I I I i I i I M I I I I M I : :| II II II II M = 

d45387 kgkFTVGAQNIKVYNKQGQLVGTSMTKAPMIDFSWSRNGVAALVENQYIVSVAHNVGYT 

y 50 60 70 80 90 100 

15 130 140 150 160 170 180 

orflncr-1 pep NVDFGAEGSNPDQHRFSYQIVKRNNYKAGTNGHPYGGDYHMPRLHKFVTDAEPVEMTSYM 
:|||||II:I1MIII:|:IIIIIII Ill Mllllll:l |::lll I 

p4538 7 DVDFGAEGNNPDQHRFTYKIVKRNNYKKD-NLHPYEDDYHNPRLHKFVTEAAPIDMTSNM 

20 110 120 130 140 150 160 

190 200 210 220 230 240 

orflnq-1 pep DGWKYADLNKYPDRVRIGAGRQYWRSDEDEPNNRESSYHIASAYSWLVGGNTFAQNGSGG 
: | | : I : I I I : I I I I I : I 1 I : I I : I : i : : : : | : I 1 : I : : I II I I : I : 

95 p4 538 7 NGSTYSDRTKYPERVRIGSGRQFWRNDQDKGD QVAGAYHYLTAGNTHNQRGAGN 

170 180 190 200 210 

250 260 270 280 290 300 

orflng-1 pep GTVNLGSEKIKHSPYGELPTGGSFGDSGSPMFIYDAQKQKWLINGVLQTGNPYIGKSNGF 
30 I II:: I : I I I I : I I I I 1 I ! I I I I I I I : I I I I I I I I : I : I I I : I I I I I 

p45387 GYSYLGGDVRKAGEYGPLPIAGSKGDSGSPMFIYDAEKQKWLINGILREGNPFEGKENGF 
220 230 240 250 260 270 

310 320 330 340 350 360 
35 orflng-1 pep QLVRKDWFYDEIFAGDTHSVFYEPHQNGKYFFNDNNNGAGKIDAKHKHYSLPYRLKTRTV 
| I I I I :: I I I ] I I I : : I I I I : : I : I I I : I I : : I : : I = 
p4538 7 QLVRKSYF— DE I FERDLHT SLYTRAGNGVYT I SGNDNGQGS ITQKS GIPSEIK 1 

280 290 300 310 320 

40 370 380 390 400 410 419 

orflng-1 pep QLFNVSLSETAREPVYHAA-GGVNSYRPRLNNGENISFIDKGKGELILTSNINQGAGGLY 
||:|| :: |:: I I I 1111111 = : 1 = 1= =1 I I :: I : I I I I I I I I I 

p45387 TLANMSLPLKEKDKVHNPRYDGPNIYSPRLNNGETLYFMDQKQGSLIFASDINQGAGGLY 
330 340 350 360 370 380 

45 

420 430 440 450 460 470 479 

orflng-1 . pep FEGNFTVSPKNNETWQGAGVHISDGSTVTWKVNGVANDRLSKIGKGTLLVQAKGENQGSV 

I I I M I I I I : : I : I I I I I I : I : I : : I I II I I I I I I : I I I II I I I I I I I I I I I I I : I I 
p4 53 87 FEGNFTVSPNSNQTWQGAGIHVSENSTVTWKVNGVEHDRLSKIGKGTLHVQAKGENKGSI 
50 390 400 410 420 430 440 

480 490 500 510 520 530 539 

orflng-1 . pep SVGDGKVILDQQADDQGKKQAFSEIGLVSGRGTVQLNADNQFNPDKLYFGFRGGRLDLNG 

1 M I I I M 1 : I I I I I I I : I I I I I I I I I II I I I I I I I I I : II : I I : M II I ! I i I I I I I 

55 p 4 5387 SVGDGKVILEQQADDQGNKQAFSEIGLVSGRGTVQLNDDKQFDTDKFYFGFRGGRLDLNG 

450 460 470 480 490 500 

540 550 560 570 580 590 

orflng-1 . pep HSLSFHRIQNTDEGAMIVNHNQDKESTVTITGNKDITT— TGNN— NNLDSKKEIAYNGWFG 

60 I I I : ! : I I I I I I II I I I I I I I : : : I I I I I I : : I : : I I I I : I I : I I I I I 

p45387 HSLTFKRIQNTDEGAMIVNHNTTQAANVTITGNESIVLPNGNNINKLDYRKEIAYNGWFG 

510 520 530 540 550 560 

600 610 620 630 640 650 

65 orflng-1 .pep EKDATKTNGRLNLNYQPEEADRTLLLSGGTNLNGNITQTNGKLFFSGRPTPHAYNHLGSG 

I I : I I j I I I I 1:1 I I I I I I I I I I I I : I : I I I I : I I I I I I I I I I I I I I II I : : 
p45387 ETDIvNKHNGRLNLIYKPTTEDRTLLLSGGTNLKGDITQTKGKLFFSGRPTPHAYNHLNKR 
570 580 590 600 610 620 

70 660 670 680 690 700 710 
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orf lng-1 . pep 



orflna-1 pep wskmegipqgeivwdndwidrtfkaenfhiqggqawsrnvakvegdwhlsnhaqavfgv 

| I : I I I I I I I I I I I I : I I I : I i I 1 I I I I : I : I I : I 11 I I I I — : I I : I : I I : I : I : I I I 
„ 4 s o 87 WSEME GIPQGEIVWDHDWINRTFKAENFQIKGGSAWSRNVSSIEGNWTVSNNANATFGV 
P 6 30 640 650 660 670 680 

720 730 740 750 760 770 

orflna-1 Pep APHQSHTICTRSDWTGLTSCTEKTITDDKVIASLSKTDIRGNVSLADHAHLNLTGLATLN 

:|:|::M Mill Ml 11:1 : : 1 I I I I I : I I : I I : : : I : I : I 1=11111 
D45387 VPNQQNTICTRSDWTGLTTCQKVDLTDTKVINSIPKTQINGSINLTDNATANVKGLAKLN 
P 690 700 710 720 730 740 

780 790 800 810 820 830 

orf lng-1 . pep GNLSAGGDTHYTVTRNATQNGNLSLVGNAQATFNQATLNGNTSASDNASFNLSNNAVQNG 

p45387 GNVTL TNHSQFTLSNNATQIG 

750 760 770 

840 850 860 870 880 890 

orf lng-1 pep SLTLSDNAKANVSHSALNGNVSLADKAVFHFENSRFTGKISGGKDTALHLKDSEWTLPSG 

:: UN: 1:1::: Mill 1:1:1 I ::M:|: =1:1 I I- I": 11 = 11 
P4538 7 NiRLSDNSTATVDNANLNGNVHLTDSAQFSLKNSHFSHQIQGDKGTTVTLENATWTMPSD 
780 790 800 810 820 830 

900 910 920 930 940 950 

TELGNLNLDNATITLNSAYRHDAAGAQTGSAADAPRRRSRRSLLSVTPPTSAESRFNTLT 
| | | | : | : | : M M I I I 1 = = 1= ::IMM 1 = I I I I I I Mill! 

45387 TTLQNLTLNNSTITLNSAY SASSNNTPRRRS- 

840 850 860 

960 970 980 990 1000 1010 

rf lnq-1 pep VNGKLNGQGTFRFMSELFGYRSGKLKLAESSEGTYTLAVNNTGNEPVSLEQLTWEGKDN 

I | | | | : I M I I : I I M II : I I I I I : : : : I I I Ml I I I : I I : I I I I I : I I : I I I 
45387 VNGKLSGQGTFQFTSSLFGYKSDKLKLSNDAEGDYIL5VRNTGKEPETLEQLTLVESKDN 

880 890 900 910 920 930 

1020 1030 1040 1050 1060 1070 

rf lng-1 pep TPLSENLNFTLQNEHVDAGAWRYQLIRKDGEFRLHNPVKEQELSDKLGKAGETEAALTAK 
I I I : : M M I = I = I II I I I M = I : = : I I I I I I II I : I I 1 I I = 1 =1 = = I : I II 

4 5387 QPLSDKLKFTLENDHVDAGALRYKLVKNDGEFRLHNPIKEQELHNDLVRAEQAERTLEAK 
940 950 960 970 980 990 

1080 1090 1100 1110 1120 1130 

rf lng-1 . pep QAQLAAKQQAEKDNAQSLDALIAAGRNAT-EKAESVAEPARQAGGENAGIMQAEEEKKRV 
| : : : I I I : : = = = I 1 II = = = = : I I = I 1 =1 = = = = = I = I 
,45387 QVEPTAKTQTGEPKVRSRRAARAAFPDTLPDQSLLNALEAKQAE-LTAETQKSKAKTKKV 
1000 1010 1020 1030 1040 1050 

1140 1150 1160 1170 1180 1190 

>rf lng-1 .pep QADK DTALAKQREAETRPATTAFPRARRARRD-LPQPQPQPQPQPQRDLISRYANSG 

: : : : | | : I : : : : : : : I I I : = I = M i I I I II = I I = 

,4 5387 • RSKRAVFSDPLLDQSLFALEAALEVIDAPQQSEKDRLAQEEAEKQ-RKQKDLISRYSNSA 
1060 1070 1080 1090 1100 1110 

1200 1210 1220 1230 1240 1250 

>rf lng-1 . pep LSEFSATLNSVFAVQDELDRVFAEDRRNAVWTSGIRDTKHYRSQDFRAYRQQ-TDLRQIG 

II I : I I I : I I = := M I I II I = I :: : == I II I : =1 :: I I : II M : II I : I I I I I 

14 5387 LSELSATVNSMLSVQDELDRLFVDQAQSAVWTNIAQDKRRYDSDAFRAYQQQKTNLRQIG 
1120 1130 1140 1150 1160 1170 

1260 1270 1280 1290 1300 1310 

irf lng-1 . pep MQKNLGSGRVGILFSHNRTGNTFDDGIGNSARLAHGAVFGQYGIGRFDIGISAGAGFSSG 

Ml I : : I I : I : I I I : I : MM: : I I I : : Mil I : : : I : : : I : I : I : : 

,4 53 87 VQKALANGRIGAVFSHSRSDNTFDEQVKNHATLTMMSGFAQYQWGDLQFGVNVGTGISAS 
1180 1190 1200 1210 1220 1230 

1320 1330 1340 1350 1360 1370 

>rf lng-1 .pep SLSDGIRGKIRRRVLHYGIQARYRAGFGGFGIEPHIGATRYFVQKADYRYENVNIATPGL 
: : : : II : I : : : : [ I : : I I : : I : I I : I : = I = : I I I : : : : I : Ml : MM 

>45387 KMAEEQSRKIHRKAINYGVNASYQFRLGQLGIQPYFGVNRYFIERENYQSEEVRVKTPSL 
1240 1250 1260 1270 1280 1290 
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1380 1390 1400 1410 1420 1430 

orflng-1 pep AFNRYRAGIKADYSFKPAQHISITPYLSLSYTDAASGKVRTRVNTAVLAQDFGKTRSAEW 

MM! M I : M I : I I : : M I : I I = : : I : I : : : : : I : I M Ml I I I : 1 
p45387 AFNRYNAGIRVDYTFTPTDNISVKPYFFVNYVDVSNANVQTTVNLTVLQQPFGRYWQKEV 
1300 1310 1320 1330 1340 1350 



1440 1450 1460 1469 

orflng-1 . pep GVN AE I KG FT L S LHAAAAKGP QL E AQH SAG I KLG YRWX 

I : M M I M : : M M I : : M M M M I 
p4 5 3 8 7 GLKAE I LHFQ I SAFI SKSQGSQLGKQQNVGVKLGYRW 

1360 1370 1380 1390 

Based on this analysis, it is predicted that these proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 78 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 655>: 

1 . .AAGGTGTGGC AATTTGTCGA AGA.CCGCTG CGTGCCGTCG TGCCTGCCGA 

51 CAGTTTTGAA CCGACCGCGC AAAAATTGAA CCTGTTTAAG GCGGGTGCGG 

101 CAACCATTTT GTTTTATGAA GATCAAAATG TCGTCAAAGG TTTGCAGGAG 

151 CAGTTCCCTG CTTATGCCGC TAACTTCCCC GTTTGGGCGg ATCAGGCAAA 

201 CGCGATGGTG CAGTATGCCG TTTGGACGAC ACTTGCCGCG GTCGGCGTAG 

251 GTGCAAACCT GCAACATTAC AATCCCTTGC CCGATGCGGC GATTGCCAAA 

301 GCGTGGAATA TCCCCGAAAA CTGGTTGTTG CGCGCACAAA TGGTTATCGG 

351 CGGTATTGAA GGGGCGGCAG GTGAAAAGAC CTTTGAACCC GTTGCAGAAC 

4 01 GTTTGAAAGT GTTCGGCGCA TAA 

This corresponds to the amino acid sequence <SEQ ID 656; ORF6>: 



1 . .KVWQFVEXPL RAWPADSFE PTAQKLNLFK AGAATILFYE DQNVVKGLQE 
51 QFPAYAANFP VWADQANAMV QYAVWTTLAA VGVGANLQHY NPLPDAAIAK 

101 AWNIPENWLL RAQMVIGGIE GAAGEKTFEP VAERLKVFGA * 

Further sequence analysis revealed a further partial DNA sequence <SEQ ID 65 7>: 



1 . . CTGCGTGCCG TCGTGCCTGC CGACAGTTTT GAACCGACCG CGCAAAAATT 

51 GAACCTGTTT AAGGCGGGTG CGGCAACCAT TTTGTTTTAT GAAGATCAAA 

101 ATGTCGTCAA AGGTTTGCAG GAGCAGTTCC CTGCTTATGC CGCTAACTTC 

151 CCCGTTTGGG CGGATCAGGC AAACGCGATG GTGCAGTATG CCGTTTGGAC 

201 GACACTTGCC GCGGTCGGCG TAGGTGCAAA CCTGCAACAT TACAATCCCT 

251 TGCCCGATGC GGCGATTGCC AAAGCGTGGA ATATCCCCGA AAACTGGTTG 

301 TTGCGCGCAC AAATGGTTAT CGGCGGTATT GAAGGGGCGG CAGGT GAAAA 

351 GACCTTTGAA CCCGTTGCAG AACGTTTGAA AGTGTTCGGC GCATAA 

This corresponds to the amino acid sequence <SEQ ID 658; ORF6-l>: 



1 . . LRAWPADSF EPTAQKLNLF KAGAATILFY EDQNVVKGLQ EQFPAYAANF 
51 PVWADQANAM VQYAVWTTLA AVGVGANLQH YNPLPDAAIA KAWNIPENWL 

101 LRAQMVIGGI EGAAGEKTFE PVAERLKVFG A* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted QRF from N. meningitidis (strain A) 

ORF6 shows 98.6% identity over a 140aa overlap with an ORF (ORF6a) from strain A of N. 
meningitidis: 

10 20 30 

orf 6 . pep KVWQFVEXPLRAWPADSFE PTAQKLNLFK 

M M M I M M M M M I M M I M M I 

orf 6a QIVEHAVLHTPSSFNSQSARVVVLFGEEHDKVWQFVEDALRAVVPADSFE PTAQKLNLFK 
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40 50 60 70 80 90 

orf6 pep AGAATILFYEDQNVVKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQHY 

5 IN U I I I I I I I I I I I I I 1 I I I M I I I I I 1 I I I I I I I I I I I I M M M I I I 1 1 

orfSa AGAAT I L FYEDQNWKGLQEQFPAYAAN FPVWADQANAMVQYAVWTT LAAVGVGANLQHY 

100 110 120 130 140 150 

100 110 120 130 140 

10 orf6 pen NPLPDAAIAKAWNIPENWLLFAQMVIGGIEGAAGEKTFEPVAERLKVFGAX 

I I I M M I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf6a NPLPDAAIAKAWNIPENWLLRAQMVIGGIEGAAGEKTFEPVAERLKVFGAX 
160 170 180 190 200 

The complete length ORF6a nucleotide sequence <SEQ ID 659> is: 

15 1 ATGACCCGTC AATCTCTGCA ACAGGCTGCC GAAAGCCGCC GTTCCATTTA 

51 TTCGTTAAAT AAAAATCTGC CCGTCGGCAA AGATGAAATC GTCCAAATCG 

101 TCGAACACGC CGTTTTGCAC ACACCTTCTT CGTTCAATTC CCAATCTGCC 

151 CGTGTGGTCG TGCTGTTTGG CGAAGAGCAT GATAAGGTGT GGCAATTTGT 

201 CGAAGACGCG CTGCGTGCCG TCGTGCCTGC CGACAGTTTT GAACCGACCG 

20 251 CGCAAAAATT GAACCTGTTT AAGGCGGGTG CGGCAACTAT TTTGTTTTAT 

301 GAAGAT C AAA ATGTCGTCAA AGGTTTGCAG GAGCAGTTCC CTGCTTATGC 

351 CGCCAACTTT CCCGTTTGGG CGGACCAGGC GAACGCGATG GTGCAGTATG 

401 CCGTTTGGAC GACACTTGCC GCGGTCGGCG TAGGTGCAAA CCTGCAACAT 

451 TACAATCCCT TGCCCGATGC GGCGATTGCC AAAGCGTGGA ATATCCCCGA 

25 501 AAACTGGTTG TTGCGCGCAC AAATGGTTAT CGGCGGTATT GAAGGGGCGG 

551 CAGGTGAAAA GACCTTTGAA CCAGTTGCAG AACGTTTGAA AGTGTTCGGC 

601 GCATAA 

This is predicted to encode a protein having amino acid sequence <SEQ ID 660>: 

1 MTRQSLQQAA ESRRSIYSLN KNLPVGKDEI VQIVEHAVLH TPSSFNSQSA 
30 51 RWVLFGEEH DKVWQFVEDA LRAVVPADSF EPTAQKLNLF KAGAAT I LFY 

101 EDQNWKGLQ EQFPAYAANF PVWADQANAM VQYAVWTTLA AVGVGANLQH 
151 YNPLPDAAIA KAWNIPENWL LRAQMVIGGI EGAAGEKTFE PVAERLKVFG 
201 A* 

35 ORF6a and ORF6-1 show 100.0% identity in 13 1 aa overlap: 

50 60 70 80 90 100 

orf 6a . pep TPSSFNSQSARVWLFGEEHDKWQFVEDALRAWPADSFEPTAQKLNLFKAGAATILFY 

I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I 
o r f 6 - 1 LRAWPADS FE PT AQKLNL FKAGAAT I LFY 

40 10 20 30 

110 120 130 140 150 160 

or f 6a . pep EDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQHYNPLPDAAIA 
I M I M I I I I I I I I I I i I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I M M I I I I I 
45 orf6-l EDQNVVKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQHYNPLPDAAIA 

40 50 60 70 80 90 

170 180 190 200 

orf6a.pep KAWN I PENWLLRAQMVIGGI EGAAGEKT FE PVAERLKVFGAX 
50 I I I I I I I I I I I I I 1 [ I I I I I I I I I I I I I ! 1 I I I I II II I I I I 

o r f 6 - 1 KAWN I PEN WLLRAQMVI GGI EGAAGEKT FE PVAERLKVFGAX 

100 110 120 130 

Homology with a predicted ORF from N. gonorrhoeae 
55 ORF6 shows 95.7% identity over a 140aa overlap with a predicted ORF (ORF6ng) from 
~N. gonorrhoeae: 

orf 6. pep KVWQFVEXPLRAWPADSFEPTAQKLNLFK 30 

I I I I I I I I I I I I I I I I I I I I I I I I : I I I 
orf6ng SNVSLDMSNPTVLRMGLPLYIASLRRGAIYKVWQFVEDALRAWPADSFEPTAQKLKLFK 64 
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nrffi cen AGAATILFYEDQNVVKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGVGANLQHY 

P P | ||||lMIIIMMIIlllllllMMMlllllll!:lililll 

orf6ng AGAATILFYEDQNVVKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGAGANLQHY 

orf6 oep NPLPDAAIAKAWNIPENWLLRAQMVIGGIEGAAGEKTFEPVAERLKVFGA 140 

| | | | | : I I I I I I I 1 I I M I I 1 I I I I I I 1 M I M I I 1 : 1 I I I I I I I I I 1 I I 
orf6ng NPLPDVAIAKAWNIPENWLLRAQMVIGGIEGAAGEKVFEPVAERLKVFGA 174 

The complete length ORF6ng nucleotide sequence <SEQ ID 661> was identified as: 

1 ATGGCCGTTG CGTCAAATGT CAGCTTGGAT ATGTCCAATC CTACGGTGTT 

"~ ACGCATGGGA TTACCCTTAT ATATTGCGTC CCTAAGAAGG GGCGCAATAT 

ATAAGGTGTG GCAATTTGTC GAAGACGCGC TGCGTGCCGT CGTGCCTGCC 

_ — r. rtl nM«^nmnip 7v 7* ppnr mm m 7\ 7\ r^nnnnrzr^rm 



101 ATflftbbTbl b bl,ilHllJOH- unnunv-ouo^ ^ j. w ^ 

151 GACAGTTTTG AACCGACCGC GCAAAAATTG AAGCTGTTTA AGGCGGGCGC 

201 GGCAACCATT TTGTTTTATG AAGATCAAAA TGTCGTCAAA GGTTTGCAGG 

15 251 AGCAGTTCCC TGCTTATGCC GCCAACTTTC CCGTTTGGGC GGACCAGGCG 

301 AACGCTATGG TACAGTATGC CGTCTGGACG ACACTTGCCG CGGTCGGTGC 

351 AGGTGCAAAT CTGCAACATT ACAACCCCTT GCCCGATGTG GCGATTGCTA 

401 AAGCGTGGAA TATTCCCGAA AACTGGCTGT TGCGCGCGCA AATGGTTATC 

451 GGTGGTATTG AAGGGGcggc aggtgaaaaa gtctttgaac CCGTTGCgga 

20 501 acgtttgAAA GTGTTCGGCG CATAA 

This encodes a protein having amino acid sequence <SEQ ID 662>: 

1 MAVASNVSLD MSNPTVLRMG LPLYIASLRR GAIYKVWQFV E DALRAWPA 

51 DSFEPTAQKL KLFKAGAATI LFYEDQNWK GLQEQFPAYA ANFPVWADQA 

101 NAMVQYAVWT TLAAVGAGAN LQHYNPLPDV AIAKAWNIPE NWLLRAQMVI 

25 151 GGIEGAAGEK VFE PVAERLK VFGA* 

ORF6ng and ORF6-1 show 96.9% identity in 131 aa overlap: 

10 20 30 

orf6-l pep LRAVVPADSFEPTAQKLNLFKAGAATILFY 
30 " I I I I I I I I I I I I I I I I I : I I I i [ I [ M I I I 

orf 6ng PTVLRMGLPLYIASLRRGAIYKVWQFVEDALRAWPADSFEPTAQKLKLFKAGAATILFY 
20 30 40 50 GO 70 

40 50 60 70 80 90 

35 orf 6-1 .pep E DQN WKGLQEQ FP AYAANFPVWADQANAMVQYAVWT T LAAVGVGANLQHYN PL PDAAI A 

I | | I I I I I I i I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I 11 I I I I i I I I I : I I I 
orf6ng EDQNWKGLQEQFPAYAANFPVWADQANAMVQYAVWTTLAAVGAGANLQHYNPLPDVAIA 
80 90 100 110 120 130 

40 100 110 120 130 

or f 6-1 . pep KAWNIPENWLLRAQMVIGGIEGAAGEKTFEPVAERLKVFGAX 

I I II II I I I I I I 1 I : I I I I I I I I II I I I I 

orf 6ng KAWNIPENWLLRAQMVIGGIEGAAGEKVFEPVAERLKVFGAX 
140 150 160 170 

45 

It is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 79 

The following partial DNA sequence was identified mN. meningitidis <SEQ ID 663> 

50 1 . . GGCTACAACT ACCTGTTCGC GCGCGGCAGC CGCATCGCCA ACTACCAAAT 

51 CAACGGCATC CCCGTTGCCG ACGCGCTGGC CGATACGGGt CAATGCCAAC 

101 ACCGCCGCCT ATGAGCGCGT AGAAGTCGTG CGCGGCGTGG CGGGGCTGCT 

151 GGACGGCACG GGCGAGCCTT CCGCCACCGT CAATCTGGTG CGCAAACGCC 

2 01 TGACCCGCAA GCCATTGTTT GAAGTCCGCG CCGAAGCgGG CAACCGcAAA 

55 2 51 CATTTCGGGC TGGACGCGGA CGTATCGGGC AGCCTGAACA CCGAAG.crC 

301 rCTGCGCgGC CGCCTGGTTT CCAcCTTCGG ACGCGGCGAC TCGTGGCGGC 



CHIR-0160 (356.001) 



-398- 



GGCGCGAACG CAGCCGskAT GCCGAACTCT ACGGCATTTT GGAATACGAC 
ATCGCACCGC AAACCCGCGT CCACGCArGC ATGGACTACC AGCAGGCGAA 

AGAAACCGCC GACGCGCCGC TCAGcTACGC CGTGTACGAC AGCCAAGGTT 
ATGCCACCGC CTTCGGCCCG AAAGACAACC CCGCCACAAA TTGGGCGAAC 
AGCCACCACC GTGCGCTCAA CCTGTTCGCC GGCATCGAAC ACCGCTTCAA 
CCAAGACTGG AAACT CAAAG CCGAATACGA CTAC . . 

This corresponds to the amino acid sequence <SEQ ID 664; ORF23>: 

1 . GYNYLFARGS RIANYQINGI PVADALADTG NANTAAYERV EWRGVAGLL 

51 DGTGEPSATV NLVRKRLTRK PLFEVRAEAG NRKHFGLDAD VSGSLNTEXX 

101 LRGRLVSTFG RGDSWRRRER SRXAELYGIL EYDIAPQTRV HAXMDYQQAK 

151 ETADAPLSYA VYDSQGYATA FGPKDNPATN WAN SHHRALN LFAGIEHRFN 

201 QDWKLKAEYD Y. . 

Further work revealed the complete nucleotide sequence <SEQ ID 665>: 

1 ATGACACGCT TCAAATATTC CCTGCTGTTT GCCGCCCTGT TGCCCGTGTA 

51 CGCGCAGGCC GATGTTTCTG TTTCAGACGA CCCCAAACCG CAGGAAAGCA 

101 CTGAATTGCC GACCATCACC GTTACCGCCG ACCGCACCGC GAGTTCCAAC 

151 GACGGCTACA CTGTTTCCGG CACGCACACC CCGCTCGGGC TGCCCATGAC 

201 CCTGCGCGAA ATCCCGCAGA GCGTCAGCGT CAT C ACAT C G CAACAAATGC 

251 GCGACCAAAA CATCAAAACG CTCGACCGCG CCCTGTTGCA GGCGACCGGC 

301 ACCAGCCGCC AGATTTACGG CTCCGACCGC GCGGGCTACA ACTACCTGTT 

351 CGCGCGCGGC AGCCGCATCG CCAACTACCA AATCAACGGC ATCCCCGTTG 

4 01 CCGACGCGCT GGCCGATACG GGCAATGCCA ACACCGCCGC CTATGAGCGC 

451 GTAGAAGTCG TGCGCGGCGT GGCGGGGCTG CTGGACGGCA CGGGCGAGCC 

501 TTCCGCCACC GTCAATCTGG TGCGCAAACG CCTGACCCGC AAGCCATTGT 

551 TTGAAGTCCG CGCCGAAGCG GGCAACCGCA AACATTTCGG GCTGGACGCG 

601 GACGTATCGG GCAGCCTGAA CACCGAAGGC ACGCTGCGCG GCCGCCTGGT 

651 TTCCACCTTC GGACGCGGCG ACTCGTGGCG GCGGCGCGAA CGCAGCCGCG 

7 01 ATGCCGAACT CTACGGCATT TTGGAATACG ACATCGCACC GCAAACCCGC 

7 51 GT CCACGC AG GCATGGACTA CCAGCAGGCG AAAGAAACCG CCGACGCGCC 

8 01 GCTCAGCTAC GCCGTGTACG ACAGC CAAGG TTATGCCACC GCCTTCGGCC 
851 CGAAAGACAA CCCCGCCACA AATTGGGCGA ACAGCCGCCA CCGTGCGCTC 
901 AACCTGTTCG CCGGCATCGA ACACCGCTTC AACCAAGACT GGAAACTCAA 
951 AGCCGAATAC GACTACACCC GCAGCCGCTT CCGCCAGCCC TACGGCGTAG 

1001 CAGGCGTGCT TTCCATCGAC CACAACACCG CCGCCACCGA CCTGATTCCC 

1051 GGTTATTGGC ACGCCGACCC GCGCACCCAC AGCGCCAGCG TGTCATTGAT 

1101 CGGCAAATAC CGCCTGTTCG GCCGCGAACA CGATTTAATC GCGGGTATCA 

1151 ACGGTTACAA ATACGCCAGC AACAAATACG GCGAACGCAG CATCATCCCC 

1201 AACGCCATTC CCAACGCCTA CGAATTTTCC CGCACGGGTG CCTACCCGCA 

1251 GCCTGCATCG TTTGCCCAAA CCATCCCGCA ATACGGCACC AGGCGGCAAA 

1301 TCGGCGGCTA TCTCGCCACC CGTTTCCGCG CCGCCGACAA CCTTTCGCTG 

1351 ATTTTGGGCG GACGATACAC CCGTTACCGC ACCGGCAGCT ACGACAGCCG 

14 01 CACACAAGGC ATGACCTATG TGTCCGCCAA CCGTTTCACC CCCTACACAG 

1451 GCATCGTGTT CGACCTGACC GGCAACCTGT CTCTTTACGG CTCGTACAGC 

1501 AGCCTGTTCG TCCCGCAATC GCAAAAAGAC GAACACGGCA GCTACCTGAA 

1551 ACCCGTAACC GGCAACAATC TGGAAGCCGG CATCAAAGGC GAATGGCTTG 

1601 AAGGCCGTCT GAACGCATCC GCCGCCGTGT ACCGCGCCCG TAAAAACAAC 

1651 CTCGCCACCG CAGCAGGACG CGACCCGAGC GGCAACACCT ACTACCGCGC 

17 01 CGCCAACCAA GCCAAAACCC ACGGCTGGGA AATCGAAGTC GGCGGCCGCA 

17 51 TCACGCCCGA AT GGC AG AT A CAGGCAGGTT ACAGCCAAAG CAAAACCCGC 

18 01 GACCAAGACG GCAGCCGCCT GAACCCCGAC AGCGTACCCG AACGCAGCTT 
1851 CAAACTCTTC ACTGCCTACC ACTTTGCCCC CGAAGCCCCC AGCGGCTGGA 
1901 CCATCGGCGC AGGCGTGCGC TGGCAGAGCG AAACCCACAC CGACCCTGCC 
1951 ACGCTCCGCA TCCCCAACCC CGCCGCCAAA GCCCGCGCCG CCGACAACAG 
2001 CCGCCAAAAA GCCTACGCCG TCGCCGACAT CATGGCGCGT TACCGCTTCA 
2051 ATCCGCGCGC CGAACTGTCG CTGAACGTGG ACAATCTGTT CAACAAACAC 
2101 TACCGCACCC AGCCCGACCG CCACAGCTAC GGCGCACTGC GGACAGTGAA 
2151 CGCGGCGTTT ACCTATCGGT TTAAATAA 

This corresponds to the amino acid sequence <SEQ ID 666; ORF23-l>: 

1 MTRFKYSLLF AALLPVYAQA DVSVSDDPKP QESTELPTIT VTADRTASSN 

51 DGYTVSGTHT PLGLPMTLRE IPQSVSVITS QQMRDQNIKT LDRALLQATG 

101 TSRQIYGSDR AGYNYLFARG SRIANYQING I PVADALADT GNANTAAYER 

151 VEVVRGVAGL LDGTGEPSAT VNLVRKRLTR KPLFEVRAEA GNRKHFGLDA 

2 01 DVSGSLNTEG TLRGRLVSTF GRGDSWRRRE RSRDAELYGI LEYDIAPQTR 



351 
401 

451 
501 
551 
601 
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251 VHAGMDYQQA KETADAPLSY AVYDSQGYAT AFGPKDNPAT NWANSRHRAL 

301 NLFAGIEHRF NQDWKLKAEY DYTRSRFRQP YGVAGVLSID HNTAATDLIP 

351 GYWHADPRTH SASVSLIGKY RLFGREHDLI AGINGYKYAS NKYGERSIIP 

401 NAIPNAYEFS RTGAYPQPAS FAQTIPQYGT RRQIGGYLAT RFRAADNLSL 

451 ILGGRYTRYR TGSYDSRTQG MTYVSANRFT PYTGIVFDLT GNLSLYGSYS 

501 SLFVPQSQKD EHGSYLKPVT GNNLEAGIKG EWLEGRLNAS AAVYRARKNN 

551 LATAAGRDPS GNTYYRAANQ AKTHGWEIEV GGRITPEWQI QAGYSQSKTR 

601 DQDGSRLNPD SVPERSFKLF TAYHFAPEAP SGWTIGAGVR WQSETHTDPA 

651 TLRIPNPAAK ARAADNSRQK AYAVADIMAR YRFNPRAELS LNVDNLFNKH 

7 01 YRTQPDRHSY GALRTVNAAF TYRFK* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with the ferric-nseudobactin receptor PupB of Pseudomonas putida (accession number P38047) 

ORF23 and PupB protein show 32% aa identity in 205aa overlap: 

FARGSRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLLDGTGEPSATVNLVRK 65 

++RG I NY+++G+P + L D + + A ++RVE+VRG G1+ G G PSAT+NL+RK 



+YGI E+D++ 



30 Homology with a predicted ORF from N. meningitidis (strain A) 

ORF23 shows 95.7% identity over a 211aa overlap with an ORF (ORF23a) from strain A of N. 



Orf23 




PupB 


215 


Orf23 


66 


PupB 


274 


Orf23 


126 


PupB 


334 


Orf23 


184 


PupB 


392 



meningitidis: 



10 



20 



30 



GYNYL FARG SR I ANYQ INGI PVADALADT G 

I I I I I I I I I I I I M II I I I I I I I I I I I M I 

QMRDQNIKALDRALLQATGTSRQIYGSDRAGYNYLFARGSRIANYQINGIPVADALADTG 
90 100 110 120 130 140 



40 



60 



70 



80 



90 



45 



NANTAAYERVEVVRGVAGLLDGTGEPSATVNLVRKRLTRKPLFEVRAEAGNRKHFGLDAD 
I I I I I I [ I [ I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I 1 I I I I I I II 
NANTAAYERVEWRGVAGLLDGTGEPSATVNLVRKRPTRKPLFEVRAEAGNRKHFGLGAD 
150 160 170 180 190 200 

100 110 120 130 140 150 

VSGSLNTEXXLRGRLVSTFGRGDSWRRRERSRXAELYGILEYDIAPQTRVHAXMDYQQAK 

I II II I : I : i M I I I I I I I I I I I I I : II I I I I I M I I M M M I I I I I I I I I I I I I I 
VSGSLNAEGTLRGRLVSTFGRGDSWRQRERSRDAELYGILEYDIAPQTRVHAGMDYQQAK 
210 220 230 240 250 260 

160 170 180 190 200 210 

ETADAPLSYAVYDSQGYATAFGPKDNPATNWANSHHRALNLFAGIEHRFNQDWKLKAEYD 

III I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I Ml 

ETADAPLSYAVYDSQGYATAFGPKDNPATNWANSRHRALNLFAGIEHRFNQDWKLKAEYD 
270 280 290 300 310 320 



60 



orf23.pep 
orf23a 
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The complete length ORF23a nucleotide sequence <SEQ ID 667> is: 

1 ATGACACGCT TCAAATATTC CCTGCTGTTT GCCGCCCTGT TGCCCGTGTA 

51 CGCGCAGGCC GATGTTTCTG TTTCAGACGA CCCAAAACCG CAGGAAAGCA 

101 CTGAATTGCC GACCATCACC GTTACCGCCG ACCGCACCGC GAGTTCCAAC 

151 GACGGCTACA CTGTTTCCGG CACGCACACC CCGCTCGGGC TGCCCATGAC 

2 01 CCTGCGCGAA ATCCCGCAGA GCGTCAGCGT CATCACATCG CAACAAATGC 

251 GCGACCAAAA CATCAAAGCG CTCGACCGCG CCCTGTTGCA GGCGACCGGC 

301 ACCAGCCGCC AGATTTACGG CTCCGACCGC GCGGGCTACA ACTACCTGTT 

351 CGCGCGCGGC AGCCGCATCG CCAACTACCA AATCAACGGC ATCCCCGTTG 

4 01 CCGACGCGCT GGCCGATACG GGCAATGCCA ACACCGCCGC CTATGAGCGC 

4 51 GTAGAAGTCG TGCGCGGCGT GGCGGGGCTG CTGGACGGCA CGGGCGAGCC 

501 TTCCGCCACC GTCAATCTGG TGCGCAAACG CCCGACCCGC AAGCCATTGT 

551 TTGAAGTCCG CGCCGAAGCG GGCAACCGCA AACATTTCGG GCTGGGCGCG 

601 GACGTATCGG GCAGCCTGAA TGCCGAAGGC ACGCTGCGCG GCCGCCTGGT 

651 TTCCACCTTC GGACGCGGCG ACTCGTGGCG GCAGCGCGAA CGCAGCCGCG 

7 01 ATGCCGAACT CTACGGCATT TTGGAATACG ACATCGCACC GCAAACCCGC 

751 GTCCACGCAG GCATGGACTA CCAGCAGGCG AAAGAAACCG CCGACGCGCC 

801 GCTCAGCTAC GCCGTGTACG ACAGCCAAGG TTATGCCACC GCCTTCGGCC 

851 CGAAAGACAA CCCCGCCACA AATTGGGCGA ACAGCCGCCA CCGTGCGCTC 

901 AACCTGTTCG CCGGCATCGA ACACCGCTTC AACCAAGACT GGAAACTCAA 

951 AGCCGAATAC GACTACACCC GCAGCCGCTT CCGCCAGCCC TACGGCGTAG 

1001 CAGGCGTGCT TTCCATCGAC CACAACACCG CCGCCACCGA CCTGATTCCC 

1051 GGTTATTGGC ACGCCGACCC GCGCACCCAC AGCGCCAGCG TGTCATTAAT 

1101 CGGCAAATAC CGCCTGTTCG GCCGCGAACA CGATTTAATC GCGGGTATCA 

1151 ACGGTTACAA ATACGCCAGC AACAAATACG GCGAACGCAG CATCATCCCC 

1201 AACGCCATTC CCAACGCCTA CGAATTTTCC CGCACGGGTG CCTACCCGCA 

1251 GCCTGCATCG TTTGCCCAAA CCATCCCGCA ATACGGCACC AGGCGGCAAA 

1301 TCGGCGGCTA TCTCGCCACC CGTTTCCGCG CCGCCGACAA CCTTTCGCTG 

1351 ATACTCGGCG GCAGATACAG CCGTTACCGC ACCGGCAGCT ACGACAGCCG 

1401 CACACAAGGC ATGACCTATG TGTCCGCCAA CCGTTTCACC CCCTACACAG 

1451 GCATCGTGTT CGACCTGACC GGCAACCTGT CGCTTTACGG CTCGTACAGC 

1501 AGCCTGTTCG TCCCGCAATC GCAAAAAGAC GAACACGGCA GCTACCTGAA 

1551 ACCCGTAACC GGCAACAATC TGGAAGCCGG CAT CAAAGGC GAATGGCTTG 

1601 AAGGCCGTCT GAACGCATCC GCCGCCGTGT ACCGCGCCCG TAAAAACAAC 

1651 CTCGCCACCG CAGCAGGACG CGACCCGAGC GGCAACACCT ACTACCGCGC 

17 01 CGCCAACCAA GCCAAAACCC ACGGCT GGGA AATCGAAGTC GGCGGCCGCA 

1751 TCACGCCCGA ATGGCAGATA CAGGCAGGTT ACAGCCAAAG CAAAACCCGC 

1801 GACCAAGACG GCAGCCGCCT GAACCCCGAC AGCGTACCCG AACGCAGCTT 

1851 CAAACTCTTC ACTGCCTACC ACTTTGCCCC CGAAGCCCCC AGCGGCTGGA 

1901 CCATCGGCGC AGGCGTGCGC TGGCAGAGCG AAACCCACAC CGACCCTGCC 

1951 ACGCTCCGCA TCCCCAACCC CGCCGCCAAA GCCCGCGCCG CCGACAACAG 

2001 CCGCCAAAAA GCCTACGCCG TCGCCGACAT CATGGCGCGT TACCGCTTCA 

2051 ATCCGCGCGC CGAACTGTCG CTGAACGTGG ACAATCTGTT CAACAAACAC 

2101 TACCGCACCC AGCCCGACCG CCACAGCTAC GGCGCACTGC GGACAGT GAA 

2151 CGCGGCGTTT ACCTATCGGT TTAAATAA 

This encodes a protein having amino acid sequence <SEQ ID 668>: 



1 


MTRFKYSLLF 


AALLPVYAQA 


DVSVSDDPKP 


QESTELPTIT 


VTADRTASSN 


51 


DGYTVSGTHT 


PLGLPMTLRE 


IPQSVSVITS 


QQMRDQNIKA 


LDRALLQATG 


101 


TSRQIYGSDR 


AGYNYLFARG 


SRIANYQING 


IPVADALADT 


GNANTAAYER 


151 


VEVVRGVAGL 


LDGTGEPSAT 


VNLVRKRPTR 


KPLFEVRAEA 


GNRKHFGLGA 


201 


DVSGSLNAEG 


TLRGRLVSTF 


GRGDSWRQRE 


RSRDAELYGI 


LEYDIAPQTR 


251 


VHAGMDYQQA 


KETADAPLSY 


AVYDSQGYAT 


AFGPKDNPAT 


NWANSRHRAL 


301 


NLFAGIEHRF 


NQDWKLKAEY 


DYTRSRFRQP 


YGVAGVLSID 


HNTAATDLIP 


351 


GYWHADPRTH 


SASVSLIGKY 


RLFGREHDLI 


AGINGYKYAS 


NKYGERSIIP 


401 


N A I PN AYE F S 


RTGAYPQPAS 


FAQTIPQYGT 


RRQ I GGYLAT 


RFRAADNLSL 


451 


ILGGRYSRYR 


TGSYDSRTQG 


MTYVSANRFT 


PYTGIVFDLT 


GNLSLYGSYS 


501 


SLFVPQSQKD 


EHGSYLKPVT 


GNNLEAGIKG 


EWLEGRLNAS 


AAVYRARKNN 


551 


LATAAGRDPS 


GNTYYRAANQ 


AKTHGWEIEV 


GGRITPEWQI 


QAGYSQSKTR 


601 


DQDGSRLNPD 


SVPERSFKLF 


TAYHFAPEAP 


SGWTIGAGVR 


WQSETHTDPA 


651 


TLRIPNPAAK 


ARAADNSRQK 


AYAVADIMAR 


YRFNPRAELS 


LNVDNLFNKH 


701 


YRTQPDRHSY 


GALRTVNAAF 


TYRFK* 






ORF23a and ORF23-1 show 


99.2% identity in 725 aa 


i overlap: 
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111 [ I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I M I I I M I I I I I II I 
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rf23a.pep 
rf23-l 



orf 23a . pep 
orf23-l 



70 80 90 100 110 120 

PLGLPMTLREIPQSVSVITSQQMRDQNIKALDRALLQATGTSRQIYGSDRAGYNYLFARG 

| | | | | | | | | | | I I I II I I I I II II I I I II = I I I 1 II I I M I M I I I I II I I II I I I I M I 
PLGLPMTLREIPQSVSVITSQQMRDQNIKTLDRALLQATGTSRQIYGSDRAGYNYLFARG 
70 80 90 100 110 120 

130 140 150 160 170 180 

SRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLLDGTGEPSATVNLVRKRPTR 
| | | | | | | | | | | | | | I I I I I M I I I I I i I II I I I M M I II II II M I I I I I I I I I II II 
srianyqingipvadaladtgnantaayervevvrgvaglldgtgepsatvnlvrkrl.tr 

130 140 150 160 170 180 

190 200 210 220 230 240 

KPLFEVRAEAGNRKHFGLGADVSGSLNAEGTLRGRLVSTFGRGDSWRQRERSRDAELYGI 
! I I I I I I I I I I I I I I I I I I I I I I I M : I I I I I I I I I I I I I I I I I I I : II I I I I I I I I I I 
KPLFEVRAEAGNRKHFGLDADVSGSLNTEGTLRGRLVSTFGRGDSWRRRERSRDAELYGI 

190 200 210 220 230 240 

250 260 270 280 290 300 

LEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAFGPKDNPATNWANSRHRAL 
I I I I I ] I I I ! I I I I II II I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I M I I 
LEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAFGPKDNPATNWANSRHRAL 

250 260 270 280 290 300 

310 320 330 340 350 360 

NL FAG I EHRFNQDWKLKAE YDYT RSRFRQ PYGVAGVL S I DHNTAATDLI PGYWHAD PRTH 
I I I I I I I I I I I I I I M II I I I II I I I I I II I I I I I I I I I I I I II II I I I I I I I I I I I I I I 
NLFAGIEHRFNQDWKLKAEYDYTRSRFRQ PYGVAGVL SI DHNTAATDLI PGYWHADPRTH 

310 320 330 340 350 360 

370 380 390 400 410 420 

SASVSLIGKYRLFGREHDLIAGINGYKYASNKYGERSIIPNAIPNAYEFSRTGAYPQPAS 
i M I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I M I I 
SASVSLIGKYRL FGREHDLIAGINGYKYASNKYGERSIIPNAIPNAYEFSRTGAYPQPAS 

370 380 390 400 410 420 

430 440 450 460 470 480 

FAQTIPQYGTRRQIGGYLATRFRAADNLSLILGGRYSRYRTGSYDSRTQGMTYVSANRFT 
I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I 1 I I I I I I I I I I I 1 M I 
FAQTIPQYGTRRQIGGYLATRFRAADNLSLILGGRYTRYRTGSYDSRTQGMTYVSANRFT 

430 440 450 460 470 480 

490 500 510 520 530 540 

PYTGIVFDLTGNLSLYGSYSSLFVPQSQKDEHGSYLKPVTGNNLEAGIKGEWLEGRLNAS 
I I I M I I I I I I I I I I II I I I M I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I 
PYTGIVFDLTGNLSLYGSYSSLFVPQSQKDEHGSYLKPVTGNNLEAGIKGEWLEGRLNAS 

490 500 510 520 530 540 

550 560 570 580 590 600 

AAVYRARKNNLATAAGRDPSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKTR 
I I! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I M I I I I I I I I I I I 
AAVYRARKNNLATAAGRDPSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKTR 

550 560 570 580 590 600 

610 620 630 640 650 660 

DQDGSRLNPDSVPERSFKLFTAYHFAPEAPSGWTIGAGVRWQSETHTDPATLRIPNPAAK 
I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I II II II II I II I I I I I I I I 1 I 1 I 
DQDGSRLNPDSVPERSFKLFTAYHFAPEAPSGWTIGAGVRWQSETHTDPATLRIPNPAAK 

610 620 630 640 650 660 

670 680 690 700 710 720 

ARAADNSRQKAYAVADIMARYRFNPRAELSLNVDNLFNKHYRTQPDRHS YGALRTVNAAF 

I I I I ! I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 
ARAADNSRQKAYAVADIMARYRFNPRAELSLNVDNLFNKHYRTQPDRHS YGALRTVNAAF 
670 680 690 700 710 720 
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orf23a.pep TYRFKX 
I I I I I I 

orf23-l TYRFKX 

Homology with a predicted ORF from N. gonorrhoeae 

ORF23 shows 93.4% identity over a 21 laa overlap with a predicted ORF (ORF23.ng) from AT. 
gonorrhoeae: 

orf23 pep GYNYLFARGSRIANYQINGIPVADALADTGNANTAAYERVEVVRGVAGLLD 51 

I M I I I I I I I I I I I I I I I I I I I !! i I I I I I I I I I I I I I I I I I I I I I > M I 
orf 23ng SAVDACRI PGYNYLFARGSRIANYQINGI PVADALADTGNANTAAYERVEWRGVAGLPD 60 

orf23 pep GTGEPSATVNLVRKRLTRKPLFEVRAEAGNRKHFGLDADVSGSLNTEXXLRGRLVSTFGR 111 

| | | | | | | | | | I I I I : I I I I I I I I I I I I I I I I I I I I 11111111:1 : I I II I I I I I M 
orf23ng GTGEPSATVNLVRKHPTRKPLFEVRAEAGNRKHFGLGADVSGSLNAEGTLRGRLVSTFGR 120 

orf 23 .pep GDSWRRRERSRXAELYGILEYDIAPQTRVHAXMDYQQAKETADAPLSYAVYDSQGYATAF 171 

Mill: I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I 
orf23ng GDSWRQLERSRDAELYGILEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAF 180 

orf23.pep GPKDNPATNWANSHHRALNLFAGIEHRFNQDWKLKAEYDY 211 

I I I I I I I I I I : i I : : I I I I i I I I II I I I I I I I II I II I I I 
or f 2 3ng GPKDNPATNWSNSRNRALNLFAGIEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLSIDHS 24 0 

The ORF23ng nucleotide sequence <SEQ ID 669> is predicted to encode a protein comprising 
amino acid sequence <SEQ ID 670>: 

1 SAVDACRIPG YNYLFARGSR IANYQINGIP VADALADTGN ANTAAYERVE 

51 WRGVAGLPD GTGEPSATVN LVRKHPTRKP LFEVRAEAGN RKHFGLGADV 

101 SGSLNAEGTL RGRLVSTFGR GDSWRQLERS RDAELYGILE YDIAPQTRVH 

151 AGMDYQQAKE TADAPLSYAV YDSQGYATAF GPKDNPATNW SNSRNRALNL 

2 01 FAGIEHRFNQ DWKLKAEYDY TRSRFRQPYG VAGVLSIDHS TAATDLIPGY 

251 WHADPRTHSA SMSLTGKYRL FGREHDLIAG INGYKYASNK YGERSIIPNA 

301 IPNAYEFSRT GAYPQPSSFA QTIPQYDTRR QIGGYLATRF RAADNLSLIL 

351 GGRYSRYRAG SYNSRTQGMT YVSANRFTPY TGIVFDLTGN LSLYGSYSSL 

4 01 FVPQLQKDEH GSYLKPVTGN NLEADIKGEW LEGRLNASAA VYRARKNNLA 

4 51 TAAGRDQSGN TYYRAANQAK THGWEIEVGG RITPEWQIQA GYSQSKPRDQ 

501 DGSRLNPDSV PERSFKLFTA YHLAPEAPSG RT I GAGVRRQ GETHTDPAAL 

551 RIPNPAAKAR AVANSRQKAY AVAD IMARYR FNPRTELSLN VDNLFNKHYR 

601 TQPDRHSYGA LRTVNAAFTY RFK* 

Further work revealed the complete nucleotide sequence <SEQ ID 67 1>: 

1 ATGACACGCT TCAAATACTC CCTGCTTTTT GCCGCCCTGC TACCCGTGTA 

51 CGCGCAGGCC GATGTTTCTG TTTCAGACGA CCCCAAACCG CAGGAAAGCA 

101 CCGAATTGCC GACCATCACC GTTACCGCCG ACCGCACCGC GAGTTCCAAC 

151 GACGGCTACA CCGTTTCCGG CACGCACACC CCGTTCGGGC TGCCCATGAC 

201 CCTGCGCGAA ATCCCGCAGA GCGTCAGCGT CATCACATCG CAACAAATGC 

251 GCGACCAAAA CAT C AAAACG CTCGACCGCG CCCTGTTGCA GGCGACCGGC 

301 ACCAGCCGCC AGATTTACGG CTCCGACCGC GCGGGCTACA ACTACCTGTT 

351 CGCGCGCGGC AGCCGCATCG CCAACTACCA AATCAACGGC ATCCCCGTTG 

4 01 CCGACGCGCT GGCCGATACG GGCAATGCCA ACACCGCCGC CTATGAGCGC 

4 51 GTAGAAGTCG TGCGCGGCGT GGCGGGGCTG CCGGACGGCA CGGGCGAGCC 

501 TTCTGCCACC GTCAATCTGG TACGCAAACA CCCGACCCGC AAGCCATTGT 

551 TTGAAGTCCG CGCCGAAGCC GGCAACCGCA AACATTTCGG GCTGGGCGCG 

601 GACGTATCGG GCAGCCTGAA CGCCGAAGGC ACGCTGCGCG GCCGCCTGGT 

651 TTCCACCTTC GGACGCGGCG ACTCGTGGCG GCAGCTCGAA CGCAGCCGCG 

7 01 ATGCCGAACT CTACGGCATT TTGGAATACG ACATCGCACC GCAAACCCGC 
751 GTCCACGCAG GCATGGACTA CCAGCAGGCG AAAGAAACCG CAGACGCGCC 

8 01 GCTCAGCTAC GCCGTGTACG ACAGCCAAGG TTATGCCACC GCCTTCGGCC 
851 CAAAAGACAA CCCCGCCACA AATTGGTCGA ACAGCCGCAA CCGTGCGCTC 
901 AACCTGTTCG CCGGCATAGA ACACCGCTTC AACCAAGACT GGAAACTCAA 
951 AGCCGAATAC GACTACACCC GTAGCCGCTT CCGCCAGCCC TACGGTGTGG 

1001 CAGGCGTACT TTCCATCGAC CACAGCACTG CCGCCACCGA CCTGATTCCC 
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1051 GGTTATTGGC ACGCcgatcc GCGCACCCAC AGCGCCAGCA TGTCATTGAC 

1101 CGGCAAATAC CgcctGTTCG GCCGCGAGCA CGATTTAATC GCGGGTATCA 

1151 ACGGCTACAA ATACGCCAGC AACAAATACG GCGAACGCAG CATCATTCCC 

1201 AACGCCATTC CCAACGCCTA CGAATTTTCC CGCACGGGCG CCTATCCGCA 

5 1251 GCCATCATCG TTTGCCCAAA CCATCCCGCA ATACGACACC AGGCGGCAAA 

1301 TCGGCGGCTA TCTCGCCACC CGTTTCCGCG CCGCCGACAA CCTTTCGCTG 

1351 ATACTCGGCG G CAGAT AC AG CCGCTACCGC GCAGGCAGCT ACAACAGCCG 

14 01 CACACAAGGC ATGACCTATG TGTCCGCCAA CCGTTTCACC CCCTACACAG 

1451 GCATCGTGTT CGATCTGACC GGCAACCTGT CGCTTTACGG CTCGTACAGC 

in 1501 AGCCTGTTCG TCCCGCAATT GCAAAAAGAC GAACACGGCA GCTACCTGAA 

1551 ACCCGTAACC GGCAACAATC TGGAAGCCGA CATCAAAGGC GAATGGCTTG 

1601 AAGGGCGTCT GAACG CAT C C GCCGCCGTGT ACCGCGCCCG TAAAAACAAC 

1651 CTCGCCACCG CAGCAGGACG CGACCAGAGC GGCAACACCT ACTATCGCGC 

17 01 CGCCAACCAA GCCAAAACCC ACGGCTGGGA AATCGAAGTC GGCGGCCGCA 

15 1751 TCACGCCCGA ATGGCAGATA CAGGCAGGCT ACAGCCAAAG CAAACCCCGC 

1801 GACCAAGACG GCAGCCGCCT GAACCCCGAC AGCGTAcCCG AACGCAGCTT 

1851 CAAACTCTTC ACCGCCTACC ACTTAGCCCC CGAAGCCCCC AGCGGCCGGA 

1901 CCATcggTGC GGGTGTGCGC CGGCAGGGCG AAACCCACAC CGACCCAGCC 

1951 GCGCTCCGCA TCCCCAACCC CGCCGCCAAA GCCCGCGCCG TCGCCAACAG 

20 2001 CCGCCAGAAA GCCTACGCCG TCGCCGACAT CATGGCGCGT TACCGCTTCA 

2051 ATCCGCGCAC CGAACTGTCG CTGAACGTGG ACAACCTGTT CAACAAACAC 

2101 TACCGCACCC AGCCCGACCG CCACAGCTAC GGCGCACTGC GGACAGTGAA 

2151 CGCGGCGTTT ACCTATCGGT TTAAATAA 

This corresponds to the amino acid sequence <SEQ ID 672; ORF23ng-l>: 

25 1 MTRFKYSLLF AALL PVYAQA DVSVSDDPKP QESTELPTIT VTADRTASSN 

51 DGYTVSGTHT PFGLPMTLRE IPQSVSVITS QQMRDQNIKT LDRALLQATG 

101 TSRQIYGSDR AGYNYLFARG SRIANYQING I PVADALADT GNANTAAYER 

151 VEWRGVAGL PDGT GEPSAT VNLVRKHPTR KPLFEVRAEA GNRKHFGLGA 

201 DVSGSLNAEG TLRGRLVSTF GRGDSWRQLE RSRDAELYGI LEYDIAPQTR 

30 251 VHAGMDYQQA KETADAPLSY AVYDSQGYAT AFGPKDNPAT NWSNSRNRAL 

301 NLFAGIEHRF NQDWKLKAEY DYTRSRFRQP YGVAGVLSID HSTAATDLIP 

351 GYWHADPRTH SASMSLTGKY RLFGREHDLI AGINGYKYAS NKYGERSIIP 

401 NAIPNAYEFS RTGAYPQPSS FAQT I PQYDT RRQIGGYLAT RFRAADNLSL 

451 ILGGRYSRYR AGSYNSRTQG MTYVSANRFT PYTGIVFDLT GNLSLYGSYS 

35 501 SLFVPQLQKD EHGSYLKPVT GNNLEADIKG EWLEGRLNAS AAVYRARKNN 

551 LATAAGRDQS GNTYYRAANQ AKTHGWEIEV GGRITPEWQI QAGYSQSKPR 

601 DQDGSRLNPD SVPERSFKLF TAYHLAPEAP SGRTIGAGVR RQGETHTDPA 

651 ALRIPNPAAK ARAVAN SRQK AYAVAD IMAR YRFNPRTELS LNVDNLFNKH 

701 YRTQPDRHSY GALRTVNAAF TYRFK* 

40 ORF23ng-l and ORF23-1 show 95.9% identity in 725 aa overlap: 

10 20 30 40 50 60 

or f 2 3-1. pep MTRFKYSLLFAALLPVYAQADVSVSDDPKPQESTELPTITVTADRTASSNDGYTVSGTHT 

orf23ng-l MTRFKYSLLFAALLPVYAQADVSVSDDPKPQESTELPTITVTADRTASSNDGYTVSGTHT 
45 10 20 30 40 50 60 

70 80 90 100 110 120 

or f 2 3-1. pep PLGLPMTLREIPQSVSVITSQQMRDQNIKTLDRALLQATGTSRQIYGSDRAGYNYLFARG 



130 140 150 160 170 180 

r f 2 3- 1 . pep SR I ANYQI NG I PVADALADT GNANTAAYERVE WRGVAGLLDGTGE P SAT VNLVRKRLTR 

I I I I I I I I I I I I t I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I : I I 
rf23ng-l SRIANYQINGIPVADALADTGNANTAAYERVEWRGVAGLPDGTGEPSATVNLVRKHPTR 

130 140 150 160 170 180 

190 200 210 220 230 240 

rf 23-1 . pep KPLFEVRAEAGNRKHFGLDADVSGSLNTEGTLRGRLVSTFGRGDSWRRRERSRDAELYGI 
I I I I 1 I 1 I I I I I I I I I I i I I I I I I I I : I I I I I I I I M I! I i I I I I I : I I I I I I I I I I I 

rf23ng-l 



65 



orf23-l .pep 



250 260 270 280 290 300 

LEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAFGPKDNPATNWANSRHRAL 
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I I M M I I I M I I I I I I I H I I I N I I I I I I II I I I I I M I I I I I I I I I M I : I I I : II I 

nrfjw.i LEYDIAPQTRVHAGMDYQQAKETADAPLSYAVYDSQGYATAFGPKDNPATNWSNSRNRAL 

g 250 260 270 280 290 300 

c 310 320 330 340 350 360 

orf23-l Pep NLFAGIEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLSIDHNT/AATDLIPGYWHADPRTH 

| | | I I Ill I I 1 I I I I I I I I I I I I I I I I I I I I: I I I I I I I I I I I 1 

orf23ng-l NLFAGIEHRFNQDWKLKAEYDYTRSRFRQPYGVAGVLSIDHSTAATDLIPGYWHADPRTH 
310 320 330 340 350 360 

^ 370 380 390 400 410 420 

orf23-l pep SASVSLIGKYRLFGREHDLIAGINGYKYASNKYGERSIIPNAIPNAYEFSRTGAYPQPAS 
I | | : | | I I I I I I I I I I I I I I I I I I I I I I I I I I I I M II I I I II II I I I I I I I I I I I I : I 

orf23ng-l SASMSLTGKYRLFGREHDL I AGINGYKYASNKYGERS 1 1 PNAI PNAYE FSRTGAYPQP S S 

15 370 380 390 400 410 420 

430 440 450 460 470 480 

orf23-l pep FAQTIPQYGTRRQIGGYLATRFRAADNLSLILGGRYTRYRTGSYDSRTQGMTYVSANRFT 
I I I I I I I I I I I I I I! I I I I I I I I I I I I I I I I I I I I : I I I : I I I : I I I I I I I I I M I I I I 
20 orf23ng-l FAQTIPQYDTRRQIGGYLATRFRAADNLSLILGGRYSRYRAGSYNSRTQGMTYVSANRFT 

430 440 450 460 470 480 

490 500 510 520 530 540 

orf23-l pep PYTGIVFDLTGNLSLYGSYSSLFVPQSQKDEHGSYLKPVTGNNLEAGIKGEWLEGRLNAS 
25 II I I I II I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I M I II I I I II I II 

orf23ng-l PYTGIVFDLTGNLSLYGSYSSLFVPQLQKDEHGSYLKPVTGNNLEADIKGEWLEGRLNAS 

490 500 510 520 530 540 

550 560 570 580 590 600 

30 orf 23-1 . pep AAVYRARKNNLATAAGRDPSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKTR 

I I I II i I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf23ng-l AAVYRARKNNLATAAGRDQSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKPR 
550 560 570 580 590 600 

35 610 620 630 640 650 660 

orf 23-1 . pep DQDGSRLNPDSVPERSFKLFTAYHFAPEAPSGWTIGAGVRWQSETHTDPATLRIPNPAAK 
I I I M II I I I I I I I I I I II I M I I : I I I I I I I I I I I I I I I : I I I I I I I : I I II I I I I I 
orf23ng-l DQDGSRLNPDSVPERSFKLFTAYHLAPEAPSGRTIGAGVRRQGETHTDPAALRIPNPAAK 
610 620 630 640 650 660 

40 

670 680 690 700 710 720 

or f 2 3- 1 . pep ARAADNSRQKAYAVADIMARYRFNPRAELSLNVDNLFNKHYRTQPDRHSYGALRTVNAAF 
III: II I II I I I I I I II I I I I I I I I : I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I 
or f 2 3ng-l ARAVANSRQKAYAVADIMARYRFNPRTELSLNVDNLFNKHYRTQPDRHSYGALRTVNAAF 
45 670 680 690 700 710 720 

orf 2 3-1. pep TYRFKX 
MINI 

50 orf23ng-l TYRFKX 

In addition, ORF23ng-l shows significant homology with an OMP from E.colr. 

sp|P16869|FHUE_ECOLI OUTER-MEMBRANE RECEPTOR FOR FE (III) -COPROGEN, FE(III)- 
FERRIOXAMINE B AND FE (III) -RHODOTRULIC ACID PRECURSOR >gi I 1651542 | gnl | PID | dl015403 
(D90745) Outer membrane protein FhuE precursor [Escherichia coli] 
55 >gi|1651545|gnl|PID|dl015405 (D90746) Outer membrane protein FhuE precursor 

[Escherichia coli] >gi 11787344 (AE000210) outer-membrane receptor for Fe(III)- 
coprogen, Fe (III) -ferrioxamine B and Fe (III ) -rhodotrulic acid precursor 
[Escherichia coli] Length = 729 
Score = 332 bits (843), Expect = 3e-90 
60 Identities = 228/717 (31%), Positives = 350/717 (48%), Gaps = 60/717 (8%) 

Query: 3 8 TITVTADRTASSN— DGYTVSGTHTPFGLPMTLREIPQSVSVITSQQMRDQNIKTLDRAL 95 

T+ V TA + + Y+V+ T + MT R+IPQSV++++ Q+M DQ ++TL + 

Sbjct: 43 TVIVEGSATAPDDGENDYSVTSTSAGTKMQMTQRDIPQSVTIVSQQRMEDQQLQTLGEVM 102 

65 

Query: 96 LQATGTSRQIYGSDRAGYNYLFARGSRIANYQINGIP VADALADTGNANTAA 147 

G S+ SDRA Y ++RG +1 NY ++GIP + DAL+D A 
Sbjct: 103 ENTLGISKSQADSDRALY YSRGFQI DN YMVDGI PT YFE SRWNLGDALS DM AL 154 



CHIR-0160 (356.001) 



-405- 



PATENT 



Query 148 YERVEWRGVAGLPDGTGEPSATVNLVRKHPTRKPLF-EVRAEAGNRKHFGLGADVSGSL 206 

+ERVEWRG GL GTG PSA +N+VRKH T + +V AE G+ AD+ L 

Sbjct: 155 FERVEVVRGATGLMTGTGNPSAAINMVRKHATSREFKGDVSAEYGSWNKERYVADLQSPL 214 

Query 207 NAEGTLRGRLVSTFGRGDSWRQLERSRDAELYGILEYDIAPQTRVHAGMDYQQAKETADA 2 66 

+G +R R+V + DSW S GI++ D+ T + AG +YQ+ + 

Sbjct: 215 TEDGKIRARIVGGYQNNDSWLDRYNSEKTFFSGIVDADLGDLTTLSAGYEYQRIDVNSPT 27 4 

Query: 267 PLSYAVYDSQGYATAFGPKDNPATNWSNSRNRALNLFAGIEHRFNQDWKLKAEYDYTRSR 326 

+++ G + ++ + A +W+ + +F ++ +F W+ ++ 

Sbjct: 275 WGGLPRWNTDGSSNSYDRARSTAPDWAYNDKEINKVFMTLKQQFADTWQATLNATHSEVE 334 

Query: 327 F — RQPYGVAGVLS I DHST AA — TDLIPGY WHADPRTHSA-SMSLTGKYRLFG 374 

F + Y A V D ++ PG+ W++ R A + G Y LFG 

Sbjct: 335 FDSKMMYVDAYVNKADGMLVGPYSNYGPGFDYVGGTGWNSGKRKVDALDLFADGSYELFG 3 94 

Query: 375 REHDLIAGINGYKYASNKYGER— SIIPNAIPNAYEFSRTGAYPQPSSFAQTIPQYDTRR 432 

R+H+L+ G Y +N+Y +1 P+ I + Y F+ G +PQ Q++ Q DT 

Sbjct: 395 RQHNLMFG-GSYSKQNNRYFSSWANIFPDEIGSFYNFN— GNFPQTDWSPQSLAQDDTTH 451 

Query: 433 QIGGYLATRFRAADNLSLILGGRYSRYRAGSYNSRTQGMTY-VSANRFTPYTGIVFDXXX 4 91 

Y ATR AD L LILG RY+ +R + +TY + N TPY G+VFD 

Sbjct: 452 MKS L YAATRVT LAD PLHL I LGARYTNWRVDT LTYSMEKNHTTPYAGLVFDIND 504 

Query: 4 92 XXXXXXXXXXXFVPQLQKDEHGSYLKPVTGNNLEADIKGEWLEGRLNASAAVYRARKNNL 551 

F PQ +D G YL P+TGNN E +K +W+ RL + A++R ++N+ 
Sbjct: 505 NWSTYASYTSIFQPQNDRDSSGKYLAPITGNNYELGLKSDWMNSRLTTTLAI FRIEQDNV 5 64 

Query: 552 ATAAGR DQSGNTYYRAANQAKTHGWEIEVGGRITPEWQIQAGYSQSKPRDQDGSRLN 608 

A + G +G T Y+A + + G E E+ G IT WQ+ G ++ D +G+ +N 

Sbjct: 565 AQSTGTPIPGSNGETAYKAVDGTVSKGVEFELNGAITDNWQLTFGATRYIAEDNEGNAVN 624 

Query: 609 PDSVPERSFKLFTAYHLAPEAPSGRTIGAGVRRQGETHTDPAALRIPNPAAKARAVANSR 668 

P ++P + K+FT+Y LP P T+G GV Q +TD P RA 
Sbjct: 625 P-NLPRTTVKMFTSYRL-PVMPE-LTVGGGVNWQNRVYTDTV TPYGTFRA E 672 

Query: 6 69 QKAYAVADIMARYRFNPRTELSLNVDNLFNKHYRTQPDRH-SYGALRTVNAAFTYRF 72 4 

Q +YA+ D+ RY+ L NV+NLF+K Y T + YG R + TY+F 

Sbjct: 67 3 QGSYALVDLFTRYQVTKNFSLQGNVNNLFDKTYDTNVEGSIVYGTPRNFSITGTYQF 72 9 

Based on this analysis, it was predicted that these proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF23-1 (77.5kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
15A shows the results of affinity purification of the His-fusion protein, and Figure 15B shows the 
results of expression of the GST-fusion in E.coli. Purified His-fusion protein was used to immunise 
mice, whose sera were used for Western blot (Figure 15C) and for ELISA (positive result). These 
experiments confirm that ORF23-1 is a surface-exposed protein, and that it is a useful immunogen. 

Example 80 

The following partial DNA sequence was identified in N.meningitidis <SEQ ID 673>: 

1 ATGCGCACGG CAGTGGTTTT GCTGTTGATC ATGCCGATGG CGGCTTCGTC 
51 GGCAATGATG CCGGAAATGG TGTGCGCGGG CGTGTCGCCG GGAACGGCAA 
101 TCATATCCAA GCCGACCGAA CAAACGGCGG TCATGGCTTC GAGTTTGTCC 
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151 AGCGTCAgcA CGCCTGCTTC GGCGgcGgCa ATCATACCTT CGTCTTCGGA 

201 AACGGGGATA AACGcGCCAC TCAAACCCCC GACCGCGCTG GAAGCCATCA 

251 TGCCGCCTTT TTTCACGGCA TCGTTCAGCA ATGCCAAAGC TGCTGTTGTG 

301 CCGTGCGTAC CGCAGACGCT CAAGCCCATT TnTTCAAGAA TGCGTGCCAC 

351 TnAGTCGCCG ACGGGG . . 

This corresponds to the amino acid sequence <SEQ ID 674; ORF24>: 

1 MRTAWLLLI MPMAASSAMM PEMVCAGVSP GTAIISKPTE QTAVMASSLS 
51 SVSTPASAAA IIPSSSETGI NAPLKPPTAL EAIMPPFFTA SFSNAKAAW 
101 PCVPQTLKPI XSRMRATXSP TG. . 

Further work revealed the complete nucleotide sequence <SEQ ID 675>: 



1 ATGCGCACGG CAGTGGTTTT GCTGTTGATC ATGCCGATGG CGGCTTCGTC 

51 GGCAATGATG CCGGAAATGG TGTGCGCGGG CGTGTCGCCG GGAACGGCAA 

101 TCATATCCAA GCCGACCGAA CAAACGGCGG TCATGGCTTC GAGTTTGTCC 

151 AGCGTCAGCA CGCCTGCTTC GGCGGCGGCA ATCATACCTT CGTCTTCGGA 

201 AACGGGGATA AACGCGCCAC TCAAACCCCC GACCGCGCTG GAAGCCATCA 

251 TGCCGCCTTT TTTCACGGCA TCGTTCAGCA ATGCCAAAGC TGCTGTTGTG 

301 CCGTGCGTAC CGCAGACGCT CAAGCCCATT TCTTCAAGAA TGCGTGCCAC 

351 TGAGTCGCCG ACGGCGGGGG TCGGCGCCAG CGACAAGTCG AGAAT AC C AA 

4 01 ACGGGATATT CAGCATTTTT GAGGCTTCGC GGCCGAT GAG TTCGCCCACG 

4 51 CGGGTAATTT TGAAAGCAGT TTTCTTCACT ACTTCCGCAA CTTCGGTCAA 

501 TGTCGTTGCA TCTGAATTTT CCAACGCGGC TTTTACGACA CCTGGGCCGG 

551 ATACGCCGAC ATTGATAACG GCATCCGCTT CGCCCGAACC ATGAAACGCG 

601 CCCGCCATAA ACGGGTTGTC TTCCACCGCG TTGCAGAACA CGACAATTTT 

651 AGCGCAGCCG AAACCTTCGG GCGTGATTTC CGCCGTGCGT TTGACGGTTT 

7 01 CGCCCGCCAG CTTGACCGCA TCCATATTGA TACCGGCACG CGTACTGCCG 

751 ATATTGATGG AGCTGCACAC AATAT CGGT A GTCTTCATCG CTTCGGGAAT 

801 GGAGCGGATT AACACCTCAT CCGAAGGCGA CATCCCTTTT TGCACCAACG 

851 CGGAAAAACC GCCGATAAAA GACACACCGA TGGCTTTGGC AGCTTTATCC 

901 AAAGTTTGCG CCACGCTGAC GTAA 

This corresponds to the amino acid sequence <SEQ ID 676; ORF24-l>: 



1 MRTAWLLLI MPMAASSA MM PEMVCAGVSP GTAIISKPTE QTAVMASSLS 

51 SVSTPASAAA IIPSSSETGI NAPLKPPTAL EAIMPPFFTA SFSNAKAAW 

101 PCVPQTLKPI SSRMRATESP TAGVGAS DKS RIPNGIFSIF EASRPMSSPT 

151 RVILKAVFFT TSATSVNVVA SEFSNAAFTT PGPDTPTLIT ASASPEP*NA 

201 PAINGLSSTA LQNTTILAQP KPSGVIS AVR LTVSPASLTA SILI PAR VLP 

251 ILMELHTISV VFIA SGMERI NTSSEGDIPF CTNAEKPPIK DTPMALAALS 

301 KVCATLT* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF24 shows 96.4% identity over a 307 aa overlap with an ORF (ORF24a) from strain A of TV. 
meningitidis: 

10 20 30 40 50 60 

orf 24a . pep MRTAWLLLIMPMAASSAMMPEMVCAGVSPGTAIISXPTEQTAVIASSLSNVSTPASAAA 
I I I 1 I I I i I I I II I I I I I I I I I I I I I I I M I I I I I I 1 I I I I I I : I I I I I : I I I I I I I I I 
or f 2 4 MRTAWLLLIMPMAAS SAMMPEMVCAGVS PGTAI I SKPTEQTAVMAS SLS SVS T PAS AAA 

10 20 30 40 50 60 



70 80 90 100 110 120 

1 1 PS S SXTGINAPLKPPTALEAIMPP FFTAS FSNAKAAWPCVPQTLKPI S SRMRATE S P 
I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I [ I [ I I I I I I I I I I I I I I I I I II I I I 
IIPSSSETGINAPLKPPTALEAIMPPFFTASFSNAKAAVVPCVPQTLKPISSRMRATESP 

70 80 90 100 110 120 

130 140 150 160 170 180 

TAGVGAS DKSRIPNGIFS I FEASRPMSSPTRVILKAVFFTTSATSVNVVASEFSNAAFTT 

I I I I I I I I I I I I I I I II II I I I I I I I I I I I I 1 I I M II I I I I II M I II II I I II I ! I I I 

TAGVGAS DKS RIPNGIFS I FEASRPMSSPTRVILKAVFFTTSATSVNWASEFSNAAFTT 



CHIR-0160 (356.001) 



-407- 



PATENT 



190 200 210 220 230 240 

PGPDTPTLITASAS PEPXNAPAIXGLSSXALQNTTILAQPKPSSVISXVRLMVSPAS LTA 
M | | | | | | | | 1 I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I : I I I I I I I I I I I I I I 
PGPDTPTLITASASPEPXNAPAINGLSSTALQNTTILAQPKPSGVISAVRLTVSPASLTA 

190 200 210 220 230 240 

250 260 270 280 290 300 

SILIPARVLPILMELHTISVVFIASGMERXNTSSEGDIPFCTSAEKPPIKDTPMALAALS 
M I [ II I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I : I I I I I I M I I I I U I I I 
SILIPARVLPILMELHTISWFIASGMERINTSSEGDIPFCTNAEKPPIKDTPMALAALS 

250 260 270 280 290 300 



orf24a.pep KVCATLTX 
I I I I I I I I 
orf24 KVCATLTX 

The complete length ORF24a nucleotide sequence <SEQ ID 677> is: 

1 ATGCGCACGG CAGTGGTTTT GCTGTTGATC ATGCCGATGG CGGCTTCGTC 

51 GGCAATGATG CCGGAAATGG TGTGCGCGGG TGTGTCGCCG GGAACGGCAA 

101 TCATATCCAA NCCGACCGAA CAAACGGCGG TCATCGCTTC GAGTTTATCC 

151 AACGT CAGCA CGCCTGCTTC GGCGGCGGCA ATCATACCTT CGTCTTCGGA 

201 NACGGGGATA AACGCGCCAC TCAAACCGCC AACCGCGCTC GAAGCCATCA 

251 TGCCGCCCTT TTTCACGGCA TCGTTCAGCA ATGCCAAAGC TGCTGTTGTG 

301 CCGTGCGTAC CGCAGACGCT CAAACCCATT TCTTCAAGAA TGCGCGCCAC 

351 CGAGTCGCCG ACGGCAGGGG TCGGTGCCAG CGACAAGTCG AGAATACCAA 

401 ACGGGATATT CAGCATTTTT GAGGCTTCGC GGCCGATGAG TTCGCCCACG 

451 CGGGTAATTT TGAAGGCGGT TTTCTTCACA ACTTCGGCAA CTTCGGTCAA 

501 TGTCGTTGCA TCCGAATTTT CCAACGCGGC TTTTACGACA CCCGGGCCGG 

551 ATACGCCGAC ATTAATCACA GCATCCGCTT CGCCTGAGCC GTGAAACGCG 

601 CCCGCCATAN ACGGGTTGTC TTCCNCCGCG TTGCAGAACA CGACGATTTT 

651 GGCGCAGCCG AAACCTTCTA GTGTGATTTC ANCCGTGCGT TTGATGGTTT 

701 CGCCCGCCAG TCTGACCGCG TCCATATTGA TACCGGCGCG CGTACTGCCG 

751 AT AT T GAT G G AGCTGCACAC GATATCAGTA GTCTTCATCG CTTCGGGAAT 

801 GGAACGGATN AACACCTCGT CAGAAGGCGA CATACCTTTT TGCACCAGCG 

851 CGGAAAAGCC GCCAATAAAA GACACGCCGA TGGCTTTGGC AGCCTTATCC 

901 AAAGTTTGCG CCACGCTGAC GTAA 

This encodes a protein having amino acid sequence <SEQ ID 678>: 



1 MRTAWLLLI MPMAASSAMM PEMVCAGVSP GTAIISXPTE QTAVIASSLS 

51 NVSTPASAAA IIPSSSXTGI NAPLKPPTAL EAIMPPFFTA SFSNAKAAW 

101 PCVPQTLKPI SSRMRATESP TAGVGASDKS RIPNGIFSIF EASRPMSSPT 

151 RVILKAVFFT T SAT SVNWA SEFSNAAFTT PGPDTPTLIT ASASPEP*NA 

201 PAIXGLSSXA LQNTTILAQP KPSSVISXVR LMVSPASLTA SILIPARVLP 

251 ILMELHTISV VFIASGMERX NTSSEGDIPF CTSAEKPPIK DTPMALAALS 

301 KVCATLT* 

It should be noted that this protein includes a stop codon at position 198. 



ORF24a and ORF24-1 show 96.4% identity in 307 aa overlap: 



10 20 30 40 50 60 

orf24a.pep MRTAVVLLL IMPMAAS SAMMPEMVCAGVS PGTAI I SXPTEQTAVI AS S LSNVST PASAAA 

I I I I I I I I I I I II I I 1 I I I I I I I I I I I I I I I I I M I I I I I I 1 I : I I I I I : I II I I I I I I 
or f 2 4 - 1 MRTAWLLLIMPMAAS SAMMPEMVCAGVS PGTAI I SKPTEQTAVMAS S LS S VST PASAAA 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf2 4a.pep IIPSSSXTGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPISSRMRATESP 

orf2 4-l I I PS SSETGINAPLKPPTALEAIMPPFFT AS FSNAKAAWPCVPQTLKPI SSRMRATESP 

70 80 90 100 110 120 



130 140 150 160 170 180 
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TAGVGASDKSRIPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVNWASEFSNAAFTT 
| I [ I I I I I I I I I I I I I i I M 1 M I 1 I 1 I M I 1 I I I I I I I 1 I I I I I I I I I I I I I I I I H I I 

TAGVGASDKSRIPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVNVVASEFSNAAFTT 

130 140 150 160 170 180 

190 200 210 220 230 240 

PGPDTPTLITASASPEPXNAPAIXGLSSXALQNTTI1AQPKPSSVISXVRLMVSPASLTA 

| | | | | | | II I I I II II I I I I I I I II I I : I I I I II M I I ! I! I : I ! I I i I I I I 

PGPDTPTLITASASPEPXNAPAINGLSSTALQNTTILAQPKPSGVISAVRLTVSPASLTA 
190 200 210 220 230 240 

250 260 270 280 290 300 

SILIPARVLPILMELHTISWFIASGMERXNTSSEGDIPFCTSAEKPPIKDTPMALAALS 

I | | | | [ [ I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II II : I I I I 1 I 1 I M I M I I I I 
SILIPARVLPILMELHTISWFIASGMERINTSSEGDIPFCTNAEKPPIKDTPMALAALS 

250 260 270 280 290 300 



Homology with a predicted ORF from N. gonorrhoeae 

ORF24 shows 96.7% identity over a 121 aa overlap with a predicted ORF (ORF24ng) from 
N. gonorrhoeae : 

orf24 .pep MRTAWLLLIMPMAAS SAMMPEMVCAGVS PGTAI I SKPTEQTAVMASS LS SVST PASAAA 60 

I I II I II I I I I I I 1 II I I I I I I I I I I I I I I I I 1 I : I I I I I I I I I M I I I I I I : I I I I I I I 
orf24ng MRTAWLLLIMPMAASSAMMPEMVCAGVSPGTAIMSKPTEQTAVMASSLSSVNTPASAAA 60 

orf24 pep IIPSSSETGINAPLKPPTALEAIMPPFFTASFSNAKAAVVPCVPQTLKPIXSRMRATXSP 12 0 

I || I I I I I I I I I I I II I I I I M I I I II I I I I I I I I M I i I I I I I I I I I I I I I I I I I II 
orf24ng IIPSSSETGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPISSRMRATESP 12 0 

orf24.pep TG 122 
I : 

orf24ng TAGVGASDKSRMPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVRLTASEFSSAALTT 18 0 

The complete length ORF24ng nucleotide sequence <SEQ ID 679> is: 



1 ATGCGCACGG CGGTGGTTTT 

51 GGCGATGATG CCGGAAATGG 

101 TCATGTCCAA ACCAACGGAG 

151 AGCGTCAACA CGCCTGCCTC 

201 AACGGGGATA AACGCGCCGC 

251 TGCCGCCCTT TTTCACGGCA 

301 CCGTGCGTAC CGCAGACGCT 

351 CGAGTCGCCG ACGGCGGGGG 

401 ACGGGATATT CAGCATTTTT 

451 CGGGTGATTT TGAAAGCGGT 

501 GCTGACCGCG TCCGAATTTT 

551 ATACGCCGAC ATTAATCACA 

601 CCCGCCATAA ACGGATTGTC 

651 GGCGCAGCCG AAACCTTCGG 

7 01 CGCCTGCCAG CTTGACCGCA 

751 ATATTGATGG AGCTGCACAC 

801 GGAACGGATC AACACCTCAT 

851 CGGAAAAGCC GCCGATAAAG 

901 AAAGTCTGCG CCACGCTGAC 

This encodes a protein having amino acic 



GCTGTTGATC ATGCCGATGG CGGCTTCGTC 
TGTGCGCGGG CGTGTCGCCG GGAACGGCAA 
CAGACGGCGG TCATGGCTTC GAGTTTGTCC 
GGCGGCGGCA ATCATACCTT CGTCTTCGGA 
TCAAACCGCC GACCGCGCTG GAAGCC AT CA 
TCGTTCAGCA ATGCCAAAGC TGCTGTTGTG 
CAAGCCCATT TCTTCAAGAA TGCGCGCCAC 
TCGGTGCCAG CGACAAATCG AGAATGCCGA 
GAGGCTTCGC GACCGATGAG TTCGCCCACG 
TTTCTTCACG ACTTCGGCGA CCTCGGTCAG 
CCAGCGCGGC TTTGACCACG CCTGGACCGG 
GCATCCGCTT CGCCCGAGCC GTGGAACGCA 
TTCCACCGCG TTGCAGAACA CGACGATTTT 
GTGTGATTTC AGCCGTGCGT TTGATGGTTT 
TCCATATTGA TACCGGCACG CGTGCTGCCG 
GATATCGGTA GTTTTCATCG CTTCGGGAAC 
CCGAAGGCGA CAT AC CT T T T TGCACCAGCG 
GACACGCCGA TGGCTTTGGC TGCCTTGTCC 
ATAA 

I sequence <SEQ ID 680>: 



1 MRTAWLLLI MPMAAS SA MM PEMVCAGVSP GTAIMSKPTE QTAVMASSLS 

51 SVNTPASAAA IIPSSSETGI NAPLKPPTAL EAIMP PFFTA SFSNAKAAW 

101 PCVPQTLKPI SSRMRATESP TAGVGASDKS RMPNGIFSIF EASRPMSSPT 

151 RVILKAVFFT TSATSVRLTA SEFSSAALTT PGPDTPTLIT ASASPEPWNA 

201 PAINGLSSTA LQNTTILAQP KPSGVIS AVR LMVSPASLTA SILI PAR VLP 
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251 ILMELHTISV VFIA SGTERI NTSSEGDIPF CTSAEKPPIK DTPMALAALS 
3 01 KVCATLT* 

ORP24ng and ORF24-1 show 96.1% identity in 307 aa overlap: 

10 20 30 40 50 60 

orf24-l pep MRTAWLLLIMPMAAS S AMMPEMVCAGVSPGT AI I SKPTEQTAVMAS SLS S VS T PAS AAA 
MMMIIIIIIIIIIIMMIIIM!IIMIII:IMIII1!I!IMI11I:IIMIM 
orf2 4nq MRTAWLLLIMPMAAS SAMMPEMVCAGVSPGTAIMSKPTEQTAVMASSLSSVNTPASAAA 



10 



20 



30 



40 



60 



70 80 90 100 110 120 

IIPSSSETGINAPLKPPTALEAIMPPFFTASFSNAKAAWPCVPQTLKPISSRMRATESP 
| | | | | | | ! | | | I M I II I I I I I I I I I I I I I I I I I I M I M M I I I I I! M I I I I I M I I I 
IIPSSSETGINAPLKPPTALEAIMPPFFTASFSNAKAAVVPCVPQTLKPISSRMRATESP 

70 80 90 100 110 120 

130 140 150 160 170 180 

TAGVGAS DKSRIPNGIFSIFEASRPMSSPTRVILKAVFFTTSATSVNWASEFSNAAFTT 
| | | | | | | | | M = II I I I M I I I I II I I I I I I I I I I M I I I I I I I I I : : I I I I I : I I : I 1 
TAGVGAS DKSRMPNG I FS I FEASRPMS S PTRVI LKAVFFTTSAT SVRLTASEFS SAALTT 

130 140 150 160 170 180 

190 200 210 220 230 240 

PGPDTPTLITASASPEPXNAPAINGLSSTALQNTTILAQPKPSGVISAVRLTVSPASLTA 

I I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I I 

PGPDTPTLITASASPEPWNAPAINGLSSTALQNTTILAQPKPSGVISAVRLMVSPASLTA 

190 200 210 220 230 240 

250 260 270 280 290 300 

S I LI PARVLP ILMELHT I S WFI AS GMERINTS SEGDI PFCTNAEKPP IKDT PMALAALS 
II I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I M II I I I I I 
S I LI PARVLP ILMELHT IS WFI ASGTERINTS SEGDI PFCTSAEKPP IKDT PMALAALS 

250 260 270 280 290 300 



45 



orf 24-1 .pep 
orf24ng 



KVCATLTX 
KVCATLTX 



Based on this analysis, including the presence of a putative leader sequence (first 18 aa - double- 
underlined) and putative transmembrane domains (single-underlined) in the gonococcal protein, 
it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could 
be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 81 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 68 1>: 

1 . . ACCGACGTGC AAAAAGAGTT GGTCGGCGAA CAACGCAAGT GGGCGCAGGA 

51 AAAAATCAGC AACTGCCGAC AAGCCGCCGC GCAGGCAGAC CGGCAGGAAT 

101 ACGCCGAATA CCTCAAGCTG CAATGCGACA CGCGGATGAC GCGCGAACGG 

151 ATACAGTATC TTCGCGGCTA TTCCATCGAT TAG 

This corresponds to the amino acid sequence <SEQ ID 682; ORF25>: 

1 . . TDVQKELVGE QRKWAQEKIS NCRQAAAQAD RQEYAEYLKL QCDTRMTRER 
51 IQYLRGYSID * 

Further work revealed the complete nucleotide sequence <SEQ ID 683>: 



1 ATGTATCGGA AACTCATTGC GCTGCCGTTT GCCCTGCTGC TTGCCGCTTG 
51 CGGCAGGGAA GAACCGCCCA AGGCATTGGA ATGCGCCAAC CCCGCCGTGT 
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101 TGCAAGGCAT ACGCGGCAAT ATTCAGGAAA CGCTCACGCA GGAAGCGCGT 

151 TCTTTCGCGC GCGAAGACGG CAGGCAGTTT GTCGATGCCG ACAAAATTAT 

201 CGCCGCCGCC TACGGTTTGG CGTTTTCTTT GGAACACGCT TCGGAAACGC 

251 AGGAAGGCGG GCGCACGTTC TGTATCGCCG ATTTGAACAT TACCGTGCCG 

301 TCTGAAACGC TTGCCGATGC CAAGGCAAAC AGCCCCCTGT TGTACGGGGA 

351 AACTGCTTTG TCGGATATTG TGCGGCAGAA GACGGGCGGC AATGTCGAGT 

401 TTAAAGACGG C GT AT T G AC G GCAGCCGTCC GCTTCCTGCC CGTCAAAGAC 

451 GGTCAGACGG CATTTGTCGA CAACACGGTC GGTATGGCGG CGCAAACGCT 

501 GTCTGCCGCG CTGCTGCCTT ACGGCGTGAA GAGCATCGTG ATGATAGACG 

551 GCAAGGCGGT GAAAAAAGAA GACGCGGTCA GGATTTTGAG CGGAAAAGCC 

601 CGTGAAGAAG AACCGTCCAA ACCCACGCCC GAAGACATTT TGGAACACAA 

651 TGCCGCCGGC GGCGATGCGG GCGTACCCCA AGCCGCAGAA GGCGCGCCCG 

701 AAC CGGAAAT CCTGCATCCT GACGACGGCG AGCGTGCCGA TACCGTTACC 

751 GTATCACGGG GCGAAGTGGA AGAGGCGCGC GTACAAAACC AGCGTGCGGA 

801 ATCCGAAATT ACCAAACTTT GGGGAGGACT CGATACCGAC GTGCAAAAAG 

851 AGTTGGTCGG CGAACAACGC AAGTGGGCGC AGGAAAAAAT CAGCAACTGC 

901 CGACAAGCCG CCGCGCAGGC AGACCGGCAG GAATACGCCG AATACCTCAA 

951 GCTGCAATGC GACACGCGGA TGACGCGCGA ACGGATACAG TATCTTCGCG 

1001 GCTATTCCAT CGATTAG 

This corresponds to the amino acid sequence <SEQ ID 684; ORF25-l>: 

1 MYRKLIALPF ALLLAA CGRE EPPKALECAN PAVLQGIRGN IQETLTQEAR 

51 SFAREDGRQF VDADKI IAAA YGLAFSLEHA SETQEGGRTF CIADLNITVP 

101 SET L ADAKAN SPLLYGETAL SDIVRQKTGG NVEFKDGVLT AAVRFLPVKD 

151 GQTAFVDNTV GMAAQTLSAA LLPYGVKSIV MIDGKAVKKE DAVRILSGKA 

201 REEEPSKPTP EDILEHNAAG GDAGVPQAAE GAPE PE I LHP DDGERADTVT 

251 VSRGEVEEAR VQNQRAESEI TKLWGGLDTD VQKELVGEQR KWAQEKISNC 

301 RQAAAQADRQ EYAEYLKLQC DTRMTRERIQ YLRGY3ID* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted QRF from N. meningitidis (strain A) 

ORF25 shows 98.3% identity over a 60aa overlap with an ORF (ORF25a) from strain A of JV. 
meningitidis: 



10 20 30 

or f 2 5. pep TDVQKELVGEQRKWAQEKISNCRQAAAQAD 

I I I I I I I I I I I I I I I I I I M I I I II i I I I 
orf25a VTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEXRKWAQEKISNCRQAAAQAD 
250 260 270 280 290 300 



40 50 60 

orf 25 .pep RQEYAEYLKLQCDTRMTRERIQYLRGYSIDX 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I 
orf 25a RQEYAEYLKLQCDTRMTRERIQYLRGYSIDX 
310 320 330 

The complete length ORF25a nucleotide sequence <SEQ ID 685> is: 



1 ATGTATCGGA AACTCATTGC GCTGCCGTTT GCCCTGCTGC TTGCCGCTTG 

51 CGGCAGGGAA GAACCGCCCA AGG CAT T GG A ATGCGCCAAC CCCGCCGTGT 

101 TGCAANGCAT ACGCNGCAAT ATTCAGGAAA CGCTCACGCA GGAAGCGCGT 

151 TCTTTCGCGC GCGAAGACNG CANGCAGTTT GTCGATGCCG ACNAAATTAT 

2 01 CGCCGCCGCC TANGNTNNGN NGNTNTCTTT GGAACACGCT TCGGAAACGC 

2 51 AGGAAGGCGG GCGCACGTTC TGTNTCGCCG ATTTGAACAT TACCGTGCCG 

3 01 TCTGAAACGC TTGCCGATGC CAAGGCAAAC AGCCCCCTGC TGTACGGGGA 
351 AACCGCTTTG TCGGATATTG TGCGGCAGAA GACGGGCGGC AATGTCGAGT 

4 01 TTAAAGACGG CGTATTGACG GCAGCCGTCC GCTTCCTACC CGTCAAAGAC 
4 51 GGTCAGANGG CATTTGTCGA CAACACGGTC GGTATGGCGG CGCAAACGCT 
501 GTCTGCCGCG TTGCTGCCTT ACGGCGTGAA GAGCATCGTG ATGATAGACG 
551 GCAAGGCGGT AAAAAAAGAA GACGCGGTCA GGATTNTGAG CNGANAAGCC 
601 CGTGAANAAG AACCGTCCAA ANCCNNGCCC GAAGACATTT TGGAACATAA 
651 TGCCGCCGGA GGGGATGCAG ACGTACCCCA AGCCGGAGAA GACGCGCCCG 
7 01 AACCGGAAAT CCTGCATCCT GACGACGGCG AGCGTGCCGA TACCGTTACC 
7 51 GTATCACGGG GCGAAGTGGA AGAGGCGCGN GTACAAAACC AGCGTGCGGA 
801 ATCCGAAATT ACCAAACTTT GGGGAGGACT CGATACCGAC GTGCAAAAAG 
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8 51 AGTTGGTCGG CGAANAACGC AAGTGGGCGC AGGAAAAAAT CAGCAACTGC 

901 CGACAAGCCG CCGCGCAGGC AGACCGGCAG GAATACGCCG AATACCTCAA 

951 GCTGCAATGC GACACGCGGA TGACGCGCGA ACGGATACAG TATCTTCGCG 

1001 GCTATTCCAT C GAT TAG 

5 This encodes a protein having amino acid sequence <SEQ ID 686>: 

1 MYRKLIALPF ALLLAA CGRE EPPKALECAN PAVLQXIRXN IQETLTQEAR 

51 SFAREDXXQF VD ADX I I AAA XXXXXSLEHA SETQEGGRTF CXADLNITVP 

101 SETLADAKAN SPLLYGETAL SDIVRQKTGG NVEFKDGVLT AAVRFLPVKD 

151 GQXAFVDNTV GMAAQTLSAA LLPYGVKSIV MIDGKAVKKE DAVRIXSXXA 

10 201 REXEPSKXXP EDILEHNAAG GDADVPQAGE DAPE PEILHP DDGERADTVT 

251 VSRGEVEEAR VQNQRAESEI TKLWGGLDTD VQKELVGEXR KWAQEKI SNC 

301 RQAAAQADRQ EYAEYLKLQC DTRMTRERIQ YLRGYSID* 

ORF25a and ORF25-1 show 93.5% identity in 338 aa overlap: 

10 20 30 40 50 60 

15 orf 25a . pep MYRKLIALPFALLLAACGREEPPKALECANPAVLQXIRXNIQETLTQEARSFAREDXXQF 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 II 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 II 

orf 25-1 MYRKLIALPFALLLAACGREEPPKALECANPAVLQGIRGNIQETLTQEARSFAREDGRQF 
10 20 30 40 50 60 

20 70 80 90 100 110 120 

orf 25a pep VDADXIIAAAXXXXXSLEHASETQEGGRTFCXADLNITVPSETLADAKANSPLLYGETAL 

Ill I I M M I II I I I I I I I I I I I I I I I I 

orf 25-1 VDADKI I AAAYGLAFSLEHASETQEGGRTFCIADLNITVPSETLADAKAN SPLLYGETAL 

70 80 90 100 110 120 



25 



130 140 150 160 170 180 

orf 25a . pep SDIVRQKTGGNVEFKDGVLTAAVRFLPVKDGQXAFVDNTVGMAAQTLSAALLPYGVKSIV 
I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I II I I I I I I I I I 
o r f 2 5 - 1 SD I VRQKTGGNVE FKDGVLT AAVRFLPVKDGQTAFVDNT VGMAAQT LS AALL PYGVKS IV 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 25a . pep MIDGKAVKKEDAVRIXSXXAREXEPSKXXPEDILEHNAAGGDADVPQAGEDAPEPEILHP 

I I I I I I I I I I I I I I I I III I I I I : I I I I I 1 I I I I I I I I I M I : I MINIMI 

orf 25-1 MI DGKAVKKEDAVRILSGKAREEEPSKPTPEDILEHNAAGGDAGVPQAAE GAPE PEILHP 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 25a . pep DDGERADTVTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEXRKWAQEKISNC 

II I II II II II II I I II II I II II I I I I II I I I I I I I I M M M II I I II II II I I II I 
orf 2 5-1 DDGERADTVTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGE QRKWAQEKISNC 

250 260 270 280 290 300 

310 320 330 339 

orf 25a. pep RQAAAQADRQEYAEYLKLQCDTRMTRERIQYLRGYSIDX 
I I I I I I I I I I II I II II I I II I M II II II II II II II I 
or f 2 5 - 1 RQAAAQADRQEYAEYLKLQCDTRMTRERI QYLRGYS I DX 

310 320 330 

Homology with a predicted ORF from N.gonorrhoeae 

ORF25 shows 100% identity over a 60aa overlap with a predicted ORF (ORF25ng) from 
N.gonorrhoeae: 

orf 25. pep T DVQKELVGEQRKWAQEKI SNCRQAAAQAD 30 

I I I II I II II II I M I II II II II II I II I 
orf25ng VTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEQRKWAQEKISNCRQAAAQAD 308 

orf 2 5. pep RQEYAEYLKLQCDTRMTRERIQYLRGYSID 60 

M M M II I I II II I I I I I M M M II I I I 

orf25ng RQEYAEYLKLQCDTRMTRERIQYLRGYSID 338 



60 The complete length ORF25ng nucleotide sequence <SEQ ID 687> is: 
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1 ATGTATCGGA AACTCATTGC GCTGCCGTTT GCCCTGCTGC TTGCAGCGTG 

51 CGGCAGGGAA GAACCGCCCA AGGCGTTGGA ATGCGCCAAC CCCGCCGTGT 

101 TGCAGGACAT ACGCGGCAGT ATT CAGGAAA CGCTCACGCA GGAAGCGCGT 

151 TCTTTCGCGC GCGAAGACGG CAGGCAGTTT GTCGATGCCG ACAAAATTAT 

2 01 CGCCGCCGCC TACGGTTTGG CGTTTTCTTT GGAACACGCT TCGGAAACGC 
251 AGGAAGGCGG GCGCACGTTC TGTATCGCCG ATTTGAACAT TACCGTGCCG 

3 01 TCTGAAACGC TTGCCGATGC CGAGGCAAAC AGCCCCCTGC TGTATGGGGA 
351 AACGTCTTTG GCAGACATCG TGCAGCAGAA GACGGGCGGC AATGTCGAGT 

4 01 TTAAAGACGG CGTATTGACG GCAGCCGTCC GCTTCCTGCC CGCCAAAGAC 
451 GCTCGGACGG CATTTATCGA CAACACGGTC GGTATGGCGA CGCAAACGCT 
501 GTCTGCCGCG TTGCTGCCTT ACGGCGTGAA GAGCATCGTG AT GAT AGACG 
551 GCAAGGCGGT GACAAAAGAA GACGCGGTCA GGGTTTTGAG CGGCAAAGCC 
601 CGTGAAGAAG AACCGTCCAA ACCCACCCCC GAAGACATTT TGGAACACAA 
651 TGCCGCCGGC GGCGATGCGG GCGTACCCCA AGCCGCAGAA GGCGCACCCG 
701 AACCCGAAAT CCTGCATCCC GACGACGTCG AGCGTGCCGA TACCGTTACC 
751 GT AT CACGGG GCGAAGTGGA AGAGGCGCGC GTACAAAACC AACGTGCGGA 
801 ATCCGAAATT ACCAAACTTT GGGGAGGACT CGATACCGAC GTGCAAAAAG 
851 AGTTGGTCGG CGAACAGCGC AAGTGGGCGC AGGAAAAAAT CAGcaactgc 
901 cgACAAGCCG CCGCGCAGGC AGACCGGCAG GAATACGCCG AATACCTCAA 
951 GCTCCAATGC GACACGCGGA TGACGCGCGA ACggaTACAG TATCTTCGCG 

1001 GCTATTCCAT CGATTAG 

This encodes a protein having amino acid sequence <SEQ ID 688>: 

1 MYRKLIALPF ALLLAA CGRE EPPKALECAN PAVLQDIRGS IQETLTQEAR 

51 SFAREDGRQF VDADKI I AAA YGLAFSLEHA SETQEGGRTF CIADLNITVP 

101 SETLADAEAN SPLLYGETSL ADIVQQKTGG NVEFKDGVLT AAVRFLPAKD 

151 ARTAFIDNTV GMATQTLSAA LLPYGVKSIV MIDGKAVTKE DAVRVLSGKA 

201 REEEPSKPTP EDI LEHNAAG GDAGVPQAAE GAPEPEILHP DDVERADTVT 

251 VSRGEVEEAR VQNQRAESEI TKLWGGLDTD VQKELVGEQR KWAQEKISNC 

301 RQAAAQADRQ EYAEYLKLQC DTRMTRERIQ YLRGYSID* 

ORF25ng and ORF25-1 show 95.9% identity in 338 aa overlap: 

10 20 30 40 50 60 

MYRKLIALPFALLLAACGREEPPKALECANPAVLQGIRGNIQETLTQEARSFAREDGRQF 
I I I I I I I I I I I I M I I I I M I I I I I I I I I I I I I I I I I I = I I I I 1 I I I I I I I I I I I I I I I 
MYRKLIALPFALLLAACGREEPPKALECANPAVLQDIRGSIQETLTQEARSFAREDGRQF 
10 20 30 40 50 60 

70 80 90 100 110 120 

VDADKI I AAA YGLAFSLEHASETQEGGRT FC IADLNITVPSETLADAKANSPLLYGETAL 

VDADKI I AAAYGLAFSLEHASETQEGGRT FC IADLNITVPSETLADAEANSPLLYGETSL 
70 80 90 100 110 120 

130 140 150 160 170 180 

SDIVRQKTGGNVEFKDGVLTAAVRFLPVKDGQTAFVDNTVGMAAQTLSAALLPYGVKSIV 
: I I I : I I I I I I I I I I I I I I I I I I M I I : I I I ! I : I I I I I I I : I I 1 I I II I I I I I I I I I 
ADIVQQKTGGNVEFKDGVLTAAVRFLPAKDARTAFIDNTVGMATQTLSAALLPYGVKSIV 

130 140 150 160 170 180 

190 200 210 220 230 240 

MIDGKAVKKEDAVRILSGKAREEEPSKPTPEDILEHNAAGGDAGVPQAAEGAPE PEILHP 

I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 
MI DGKAVTKEDAVRVLSGKAREEEPSKPTPEDILEHNAAGGDAGVPQAAEGAPE PEILHP 

190 200 210 220 230 240 

250 260 270 280 290 300 

DDGERADTVTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEQRKWAQEKISNC 

II I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I II ! I II I I I I I I I I I I 
DDVERADTVTVSRGEVEEARVQNQRAESEITKLWGGLDTDVQKELVGEQRKWAQEKISNC 

250 260 270 280 290 300 

310 320 330 339 

RQAAAQADRQEYAEYLKLQCDTRMTRERIQYLRGYSIDX 

RQAAAQADRQEYAEYLKLQCDTRMTRERIQYLRGYSIDX 
310 320 330 



orf 25-1. pep 
orf25ng 

orf25-l .pep 
orf25ng 

orf25-l.pep 

orf25ng 

orf 25-1 . pep 
orf25ng 

orf25-l.pep 
orf 25ng 

orf25-l.pep 
orf25ng 
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Based on this analysis, including the presence of a predicted prokaryotic membrane lipoprotein 
lipid attchment site (underlined) in the gonococcal protein, it was predicted that the proteins from 
N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or 
diagnostics, or for raising antibodies. 

ORF25-1 (37kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
16A shows the results of affinity purification of the GST-fusion protein, and Figure 16B shows the 
results of expression of the His-fusion in E.coli. Purified His-fusion protein was used to immunise 
mice, whose sera were used for Western blot (Figure 16C), ELISA (positive result), and FACS 
analysis (Figure 16D). These experiments confirm that ORF25-1 is a surface-exposed protein, and 
that it is a useful immunogen. 

Figure 16E shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF25-1. 
Example 82 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 68 9> 

1 ATGCAGCTGA TCGACTATTC ACATTCATTT TTCTCGGTTG TGCCACCCTT 

51 TTTGGCACTG GCACTTGCCG TCATTACCCG CCGCGTACTG CTGTCTTTAG 

101 GCATCGGTAT TCTGGwysGC GTTGCCTTTT TGGTCGGCGG CAACCCCGTC 

151 GACGGTCTGA CACACCTGAA AGACATGGTC GTCGGCTTGG CTTGGTCAGA 

201 CGsyGATTGG TCGCTGGGCA AACCAAAAAT CTTGGTTTTC CkGATACTTT 

251 TGGGTATTTT TACTTCCCTG CTGACCTACT CCGGCAGCAA T 

// 

851 AC TTCGCTGGTA 

901 TTCGGCGGCA CTTGCGGCGT CTTTGCCGTC GTTCTCTGCA CGCTCGGCAC 

951 GATTAAAACC GCCGACTATC CCAAAGCCGT TTGGCAGGGT GCGAAATCTA 

10 01 TGTTCGGCGC AATCGCCATT TTAATCCTCG CTTGGCTCAT CAGTACGGTT 

1051 GTCGGCGAAA TGCACACCGG CGATTACCTC TCCACACTGG TTGCGGGCAA 

1101 CATCCATCCC GGCTTCCTGC CCGTCATCCT CTTCCTGCTC GCCAGCGTGA 

1151 TGGCGTTTGC CACAGGCACA AGCTGGGGGA CGTTCGGCAT TATGCTGCCG 

12 01 ATTGCCGCCG CCATGGCGGT CAAAGTCGAA CCCGCGCTGA TTATCCCGTG 

1251 TATGTCCGCA GTAATGGCGG GGGCGGTATG CGGCGACCAC TGCTCGCCCA 

1301 TTTCCGACAC GACCATCCTG TCGTCCACCG GCGCGCGCTG CAACCACATC 

1351 GACCACGTTA CCTCGCAACT GCCTTACGCC TTAACCGTTG CCGCCGCCGC 

14 01 CGCATCGGGC TACCTCGCAT TGGGTCTGAC AAAATCCGCG CTGTTGGGCT 

14 51 TTGGCACGAC AGGCATTGTA TTGGCGGTGC TGATTTTTCT GTTGAAAGAT 

1501 AAAAAA. . 

This corresponds to the amino acid sequence <SEQ ID 690; ORF26>: 



// 

251 TSLV 

3 01 FGGTCGVFAV VLCTLGTIKT ADYPKAVWQG AKSMFGAIAI LILAWLISTV 
351 VGEMHTGDYL STLVAGNIHP GFLPVILFLL ASVMAFATGT SWGTFGIMLP 

4 01 IAAAMAVKVE PALIIPCMSA VMAGAVCGDH CSPISDTTIL SSTGARCNHI 
4 51 DHVTSQLPYA LTVAAAAASG YLALGLTKSA LLGFGTTGIV LAVLIFLLKD 
501 KK. . 



Further work revealed the complete nucleotide sequence <SEQ ID 69 1> 
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1 ATGCAGCTGA TCGACTATTC ACATTCATTT TTCTCGGTTG TGCCACCCTT 

51 TTTGGCACTG GCACTTGCCG TCATTACCCG CCGCGTACTG CTGTCTTTAG 

101 GCATCGGTAT TCTGGTCGGC GTTGCCTTTT TGGTCGGCGG CAACCCCGTC 

151 GACGGTCTGA CACACCTGAA AGACATGGTC GTCGGCTTGG CTTGGTCAGA 

2 01 CGGCGATTGG TCGCTGGGCA AACCAAAAAT CTTGGTTTTC CTGATACTTT 

251 TGGGTATTTT TACTTCCCTG CTGACCTACT CCGGCAGCAA TCAGGCGTTT 

301 GCCGACTGGG CAAAACGGCA CAT T AAAAAC CGGCGCGGCG CGAAAATGCT 

351 GACCGCCTGC CTCGTGTTCG TAACCTTTAT CGACGACTAT TTCCACAGTC 

4 01 TCGCCGTCGG TGCGATTGCC CGCCCCGTTA CCGACAAGTT TAAAGTTTCC 

4 51 CGCACCAAAC TCGCCTACAT CCTCGACTCC ACTGCCGCTC CTATGTGCGT 

501 GCTGATGCCC GTTTCAAGCT GGGGCGCGTC GATTATCGCC ACGCTTGCCG 

551 GACTGCTCGT TACCTACAAA ATCACCGAAT ACACGCCGAT GGGGACGTTT 

601 GTCGCCATGA GCCTGATGAA CTATTACGCA CTGTTTGCCC TGATTATGGT 

651 GTTCGTCGTC GCATGGTTTT CCTTCGACAT CGGCTCGATG GCACGTTTCG 

701 AACAAGCCGC GTTGAACGAA GCCCACGATG AAACTGCCGT TTCAGACGCT 

751 AC C AAAGGT C GTGTTTACGC ACTGATTATT CCCGTTTTGG CCTTAATCGC 

801 CTCAACGGTT TCCGCCATGA TCTACACCGG CGCGCAGGCA AGCGAAACCT 

851 TCAGCATTTT GGGGGCATTT GAAAACACGG ACGTAAACAC TTCGCTGGTA 

901 TTCGGCGGCA CTTGCGGCGT CCTTGCCGTC GTTCTCTGCA CGCTCGGCAC 

951 GATTAAAACC GCCGACTATC CCAAAGCCGT TTGGCAGGGT GCGAAATCTA 

1001 TGTTCGGCGC AATCGCCATT TTAATCCTCG CTTGGCTCAT CAGTACGGTT 

1051 GTCGGCGAAA TGCACACCGG CGATTACCTC TCCACACTGG TTGCGGGCAA 

1101 CATCCATCCC GGCTTCCTGC CCGTCATCCT CTTCCTGCTC GCCAGCGTGA 

1151 TGGCGTTTGC CACAGGCACA AGCTGGGGGA CGTTCGGCAT TATGCTGCCG 

1201 ATTGCCGCCG CCATGGCGGT CAAAGTCGAA CCCGCGCTGA TTATCCCGTG 

1251 TATGTCCGCA GTAATGGCGG GGGCGGTATG CGGCGACCAC TGCTCGCCCA 

1301 TTTCCGACAC GACCATCCTG TCGTCCACCG GCGCGCGCTG CAACCACATC 

1351 GACCACGTTA CCTCGCAACT GCCTTACGCC TTAACCGTTG CCGCCGCCGC 

14 01 CGCATCGGGC TACCTCGCAT TGGGTCTGAC AAAATCCGCG CTGTTGGGCT 

1451 TTGGCACGAC AGGCATTGTA TTGGCGGTGC TGATTTTTCT GTTGAAAGAT 

1501 AAAAAACGCG CCAACGCCTG A 

This corresponds to the amino acid sequence <SEQ ID 692; ORF26-l>: 

1 MQLIDYSHSF FSVVPPFLAL A LAVITRR VL LSLGIGILVG VAFLV GGNPV 

51 DGLTHLKDMV VGLAWSDGDW SLGKPK ILVF LILLGIFTSL LTY SGSNQAF 

101 ADWAKRHIKN R RGAKMLTAC LVFVTFID DY FHSLAVGAIA RPVTDKFKVS 

151 RTKLAYILDS TAAPMCVLMP VSSWGASIIA TLAGLLV TYK ITEYTPMGTF 

201 VAMSLMNYYA LFALIMVFW AWFSFDI GSM ARFEQAALNE AHDETAVSDA 

251 TKGRVY ALII PVLALIASTV SAMI YTGAQA SETFSILGAF ENTDVNTSLV 

301 FGGTCGVLAV VLCTL GTIKT AD Y PKAVWQG AKSM FGAIAI LILAWLISTV 

351 VGEMHTGDYL STLVAGNIHP GFLPVILFL1 ASVMAFA TGT SW GTFGIMLP 

401 IAAAMAVKVE P ALIIPCMSA VMAGAVCG DH CSPISDTTIL SSTGARCNHI 

451 DHVTSQLPY A LTVAAAAASG YLALGL TKSA LLGFGTTGIV LAVLI FL LKD 

501 KKRANA* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with the hypothetical transmembrane protein HI1586 of H. influenzae (accession number P44263) 
ORF26 and HI1586 show 53% and 49% amino acid identity in 97 and 221 aa overlap at the 
N-terminus and C-terminus, respectively: 

Orf2 6 1 MQLIDYSHSFFSVVPPFLALALAVITRRVXXXXXXXXXXXVAFLVGGNPVDGLTHLKDMV 60 

M+LID+S S +S+VP LA+ LA+ TRRV L +L V 

HI1586 14 MELI D FS S SVWS IVPALLAI ILAI ATRRVLVSLS AGI 1 1 GS LMLS DWQI GS AFNYLVKNV 73 

Orf2 6 61 VGLAWSDXDWSLGKPKILVFXILLGIFTSLLTYSGSN 97 

V L ++D + + I++F +LLG+ T+LLT SGSN 

HI1586 74 VSLVYADGEIN-SNMNIVLFLLLLGVLTALLTVSGSN 109 

// 

Orf2 6 8 6 IFTSLLTYSGS — NTSLVFGGTCGVFAWLCTL — GTIKTADYPKAVWQGAKSMFGXXXX 141 

+F+ L T+ + TSLV GG C + L + + +Y ++ G KSM G 
HI1586 299 VFSVLGTFENTWGTSLWGGFCSIIISTLLIILDRQVSVPEYVRSWIVGIKSMSGAIAI 358 
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Orf2 6 142 XXXXXXXSTWGEMHTGDYLSTLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLP 201 

+ +VG+M TG YLS+LV+GNI FLPVILF+L + MAF+TGT SWGT FGIMLP 
HI1586 359 LFFAWTINKIVGDMQTGKYLSSLVSGNIPMQFLPVILFVLGAAMAFSTGT SWGT FGIMLP 418 

5 Orf26 202 IAAAMAVKVEPALIIPCMSAVMAGAVCGDHCSPISDTTILSSTGARCNHIDHVTSQXXXX 261 

IAAAMA P L++PC+SAVMAGAVCGDHCSP+SDTTILSSTGA+CNHIDHVT+Q 
HI1586 419 IAAAMAANAAPELLLPCLSAVMAGAVCGDHCSPVSDTTILSSTGAKCNHIDHVTTQLPYA 478 

Orf2 6 2 62 XXXXXXXXXXXXXXXXXKSALLGFGTTGIVLAVLIFLLKDK 302 
IQ S L GF T + L V+IF +K + 

HI1586 479 ATVATATSIGYIWGFTYSGLAGFAATAVSLIVIIFAVKKR 519 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF26 shows 58.2% identity over a 502aa overlap with an ORF (ORF26a) from strain A of TV. 
15 meningitidis: 

10 20 30 40 50 60 

orf2 6 pep MQLIDYSHSFFSVVPPFLALA LAVITRR VLLSLGIGILXXVAFLV GGNPVDGLTHLKDMV 
I M M I I !! I I I I I ![! 1 I I I ! I I I I I I I I I I I I I M I ! I I I I I I I I I I I I I I M 1 I I 
or f26a MQLIDYSHSFFSWPPFLALA LAVITRR VLLSLGIGILVGVAFLV GGNPVDGLTHLKDMV 
20 10 20 30 40 50 60 

70 80 90 99 

orf26 .pep VGLAWSDXDWSLGKPK ILVFXILLGIFTSLLTY SGSNXX 

I I I I I I I I I I 1 I I I I III I I I I I I I I I I I I I ! I I 
25 orf26a VGLAWSDGDWSLGKPK XLVFLILLGIFTSLLTY SGSNQAFADWAKRHIKNR RGAKMLTAC 

70 80 90 100 110 120 

orf26.pep 

30 

orf 2 6a LVFVTFID DYFHSLAVGAXARPVTDKFKVSRAKLAYILDSTAAPMCVLMP VSSWGASIIA 
130 140 150 160 170 180 

35 orf26.pep 

orf 2 6a TLAGLLV TYKITEYTPMGTFVAMSLMNYY ALFALIMVFWAWFSFDI GSMARFEQAALNE 
190 200 210 220 230 240 

40 ioo no 

orf 2 6. pep TSLV 

I I I I 

orf 2 6a AHDETAVSDGSWGRVY ALIIPVLALIASTVSAMI YTGAQASETFSILGAFENTDVNTSLV 
250 260 270 280 290 300 

45 

120 130 140 150 160 170 

orf 2 6 . pep FGGTCGVFAWLCTL GTIKTADYPKAVWQGAKSM FGAIAILILAWLISTW GEMHTGDYL 
I I I II I [ : I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II 
or f 2 6a FGGTCGVLAWLCT L GT IKI ADYPKAVWQGAKSM FGAI AI L I LAW LI STW GEMHTGDYL 

50 310 320 330 340 350 360 

180 190 200 210 220 230 

or f 2 6 . pep STLVAGNIHP GFLPVILFLLASVMAFA TGTSW GTFGIMLPIAAAMAVKV EP ALIIPCMSA 
I II I I I I I I M I 1 I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I : I : I I I I I I I I 
55 orf 2 6a STLVAGNIHP GFLXVILFLLASVMAFA TGT SW GTFGIMLPIAAAMAVKV DP SLIIPCMSA 

370 380 390 400 410 420 

240 250 260 270 280 290 

orf 2 6. pep VMAGAVCG DHCSPISDTTILSSTGARCNHIDHVTSQLPY ALTVAAAAASGYLALGL TKSA 
60 I I I I I II II I I I I I I I I I I I I I I I I I I 1 11 I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 

orf 2 6a VMAGAVCG DHCSPISDTTILSSTGARCNHIDHVTSQLPY ALTVAAAAASGYLALGL TKSA 
430 440 450 460 470 480 

300 310 
65 orf 2 6. pep LLGFGTTGIVLAVLIFL LKDKK 

I I I I I : I I I I I II I I I I I II I I 
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or f 2 6a LLGFGXTGIVLAVLIFL LKDKKRANAX 
490 500 

The complete length ORF26a nucleotide sequence <SEQ ID 693> is: 

1 ATGCAGCTGA TCGACTATTC ACATTCATTT TTCTCGGTTG TGCCACCCTT 

51 TTTGGCACTG GCACTTGCCG TCATTACCCG CCGCGTACTG CTGTCTTTAG 

101 GCATCGGTAT TCTGGTCGGC GTTGCCTTTT TGGTCGGCGG CAACCCCGTC 

151 GACGGTCTGA CACACCTGAA AGACATGGTC GTCGGCTTGG CTTGGTCAGA 

201 CGGCGATTGG TCGCTGGGCA AACCAAAANT CTTGGTTTTC CTGATACTTT 

251 TGGGTATTTT TACTTCCCTG CTGACCTACT CCGGCAGCAA TCAGGCGTTT 

301 GCCGACTGGG CAAAACGGCA CATTAAAAAC CGGCGCGGCG CGAAAATGCT 

351 GACCGCCTGC CTCGTGTTCG TAACCTTTAT CGACGACTAT TTCCACAGTC 

401 TCGCCGTCGG TGCGNTTGCC CGCCCCGTTA CCGACAAGTT TAAAGTTTCC 

451 CGCGCCAAAC TCGCCTACAT CCTCGACTCC ACTGCCGCGC CTATGTGCGT 

501 GCTGATGCCC GTTTCAAGCT GGGGCGCGTC GATTATCGCC ACGCTTGCCG 

551 GACTGCTCGT TACCTACAAA ATCACCGAAT ACACGCCGAT GGGGACGTTT 

601 GTCGCCATGA GCCTGATGAA CTATTACGCA CTGTTTGCCC TGATTATGGT 

651 GTTCGTCGTC GCATGGTTCT CCTTCGACAT CGGCTCGATG GCACGTTTCG 

701 AACAAGCCGC GTTGAACGAA GCCCACGATG AAACTGCCGT TTCAGACGGC 

751 AGCTGGGGCA GGGTTTACGC ATTGATTATT CCCGTTTTGG CCTTAATCGC 

801 CTCAACGGTT TCCGCCATGA TCTACACCGG TGCACAGGCA AGCGAAACCT 

851 TCAGCATTTT GGGTGCATTT GAAAATACGG ACGTGAACAC TTCGCTGGTA 

901 TTCGGCGGCA CTTGCGGCGT GCTTGCCGTC GTCCTCTGCA CGCTCGGCAC 

951 GATTAAAATC GCCGATTATC CCAAAGCCGT TTGGCAGGGT GCGAAATCCA 

1001 TGTTCGGCGC AATCGCCATT TTAATCCTTG CCTGGCTCAT CAGTACGGTT 

1051 GTCGGCGAAA TGCACACAGG CGACTACCTC TCCACGCTGG TTGCGGGCAA 

1101 CATCCATCCC GGCTTCCTGN CCGTCATCCT TTTCCTGCTC GCCAGCGTGA 

1151 TGGCGTTTGC CACAGGCACA AGCTGGGGGA CGTTCGGCAT CATGCTGCCG 

1201 ATTGCCGCCG CCATGGCGGT CAAAGT CGAT CCCTCACTGA TTATCCCGTG 

1251 TATGTCCGCC GTGATGGCGG GGGCGGTATG CGGCGACCAC TGCTCGCCCA 

1301 TTTCCGACAC GACCATCCTG TCGTCCACCG GCGCGCGCTG CAACCACATC 

1351 GACCACGTTA CNTCGCAACT GCCTTACGCC TTAACCGTTG CCGCCGCCGC 

1401 CGCATCGGGN TACCTCGCAT TGGGTCTGAC AAAATCCGCG CTGTTGGGTT 

1451 TTGGCANGAC AGGCATTGTA TTGGCGGTGC TGATTTTTCT GTTGAAAGAT 

1501 AAAAAACGCG CCAACGCCTG A 

This encodes a protein having amino acid sequence <SEQ ID 694>: 



1 MQLIDYSHSF FSWPPFLAL A LAVITRR VL LSLGIGILVG VAFLV GGNPV 

51 DGLTHLKDMV VGLAWSDGDW SLGKPK XLVF LILLGIFTSL LTY SGSNQAF 

101 ADWAKRHIKN R RGAKMLTAC LVFVTFID DY FHSLAVGAXA RPVTDKFKVS 

151 RAKLAYILDS TAAPMCVLMP VSSWGASIIA TLAGLLV TYK ITEYTPMGTF 

2 01 VAMSLMNYYA LFALIMVFW AWFSFDI GSM ARFEQAALNE AHDETAVSDG 

251 SWGRVYA LII PVLALIASTV SAMI YTGAQA SETFSILGAF ENTDVNTSLV 

301 FGGTCGVLAV VLCTL GTIKI ADY PKAVWQG AKSM FGAIAI LILAWLISTV 

351 VGEMHTGDYL STLVAGNIHP GFLXVILFLL ASVMAFA TGT SWGTFGIMLP 

4 01 IAAAMAVKV D P SLIIPCMSA VMAGAVCG DH CSPISDTTIL SSTGARCNHI 

451 DHVTSQLPY A LTVAAAAASG YLALGL TKSA LLGFGXTGIV LAVLIFL LKD 

501 KKRANA* 

ORF26a and ORF26-1 show 97.8% identity in 506 aa overlap: 



10 20 30 40 50 60 

orf 2 6a . pep MQLIDYSHSFFSVVPPFLALALAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 

1 1 1 1 n n 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

orf 2 6-1 MQLIDYSHSFFSVVPPFLALALAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 
10 20 30 40 50 60 



70 80 90 100 110 120 

orf 2 6a. pep VGLAWSDGDWSLGKPKXLVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRRGAKMLTAC 
I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I M I I I II I I II I I M I I II I I I I I I I I I I 
orf 2 6-1 VGLAWSDGDWSLGKPKILVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRRGAKMLTAC 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 2 6a . pep LVFVTFIDDYFHSLAVGAXARPVTDKFKVSRAKLAYILDSTAAPMCVLMPVSSWGASIIA 
I I I I I I I I I I I II II I II I I 1 I I I I I I I II : I I I I I I I I II I I I I I I I I 1 I I I! II II I 
orf 2 6-1 LVFVTFI DDYFHSLAVGAI ARPVTDKFKVSRTKLAYI LD STAAPMCVLMPVS SWGAS 1 1 A 
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130 



140 



150 



160 



170 



180 



orf2 6a.pep 
orf26-l 



orf 2 6a. pep 
orf26-l 



orf26a.pep 
orf26-l 



orf 26a. pep 
orf26-l 



orf26a.pep 
orf26-l 



orf 2 6a . pep 
orf26-l 



190 200 210 220 230 240 

TLAGLLVTYKITEYTPMGTFVAMSLMNYYALFALIMVFWAWFSFDIGSMARFEQAALNE 

| | | | | | I M | I I I I I I I II I I ! I I I I I I M I Ml II I I I I I I 1 I 1 I 

TLAGLLVTYKITEYTPMGTFVAMSLMNYYALFALIMVFWAWFSFDIGSMARFEQAALNE 

190 200 210 220 230 240 

250 260 270 280 290 300 

AHDETAVSDGSWGRVYALIIPVLALIAS TVSAMIYTGAQASETFSILGAFENTDVNTSLV 
i I ! [ I I I I I : : I I I I II I I I I M II I I I I I I I I I M II 1 M 1 I I I M I i II M I I I I I I 
AHDETAVSDATKGRVYALIIPVLALIASTVSAMIYTGAQASETFSILGAFENTDVNTSLV 

250 260 270 280 290 300 

310 320 330 340 350 360 

FGGTCGVLAWLCTLGTIKIADYPKAVWQGAKSMFGAIAILILAWLISTVVGEMHTGDYL 
| I I I I I M I I I I I I I I II I I I I I I I I I I I I I I I I I I I I M I I II I M M I I I I I I II II 
FGGTCGVLAVVLCTLGT IKTADYPKAVWQGAKSMFGAI AI LI LAWL I STWGEMHTGDYL 

310 320 330 340 350 360 

370 380 390 400 410 420 

STLVAGNIHPGFLXVILFLLASVMAFATGTSWGTFGIMLPIAAAMAVKVDPSLIIPCMSA 
| | | I I I I I I I I II I I I I I I I I I I I I I I M II I I I I I I I I I I I I I I I I I : I : I I I I I I I I 
STLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLPIAAAMAVKVEPALIIPCMSA 

370 380 390 400 410 420 

430 440 450 460 470 480 

VMAGAVCGDHCSPISDTTILSSTGARCNHIDHVTSQLPYALTVAAAAASGYLALGLTKSA 
I I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
VMAGAVCGDHCSPISDTTILSSTGARCNHIDHVTSQLPYALTVAAAAASGYLALGLTKSA 

430 440 450 460 470 480 

490 500 
LLGFGXTGIVLAVLIFLLKDKKRANAX 
I I I I I : I I I I I I I I I I I I I I I I I I I I I 
LLGFGT TGIVLAVL I FLLKDKKRANAX 

490 500 



Homology with a predicted ORF from N.gonorrhoeae 

ORF26 shows 94.8% and 99% identity in 97 and 206 aa overlap at the N-terminus and C-terminus, 
respectively, with a predicted ORF (ORF26ng) from N. gonorrhoeae: 

orf 2 6. pep MQL I DY SH S F FSWPP FLAL ALAVI TRRVLL SLG I G I LXXVAFLVGGNPVDGLT HLKDMV 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M II I I I I I I I I I I I I I I I 
or f 2 6ng MQLIDYSHSFFSWPPFLAL ALAVI TRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 60 

orf 2 6. pep VGLAWSDXDWSLGKPKILVFXILLGIFTSLLTYSGSN 97 

orf2 6ng VGLAWADGDWSLGKPKILVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRCGAKMLTAC 12 0 



rf2 6.pep TSLVFGGTCGVFAWLCTLGTIKTADYPKA 32 6 

rf2 6ng ASTVSAMIYTGAQASETFSILGAFENTDVNTSLVFGGTCGVLAWLCTFGTIKTADYPKA 32 6 

rf26.pep VWQGAKSMFGAIAILILAWLISTWGEMHTGDYLSTLVAGNIHPGFLPVILFLLASVMAF 38 6 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II 

rf26ng VWQGAKSMFGAIAILILAWLISTWGEMHTGDYLSTLVAGNIHPGFLPVILFLLASVMAF 38 6 

rf 2 6 . pep ATGT S WGT FG IMLP I AAAMAVKVE PAL 1 1 PCMS AVMAGAVCGDHC SPISDTTILS S TGAR 44 6 

I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I II II II II I I I I II II I M I I 

rf2 6ng ATGTSWGT FGIMLPIAAAMAVKVEPALIIPCMSAVMAGAVCGDHCSPISDTTILSSTGAR 44 6 

rf 2 6 . pep CNHIDHVTSQLPYALTVAAAAASGYLALGLTKSALLGFGTTGIVLAVLIFLLKDKK 502 

I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I II 1 I I I I I II I I I I 

r f 2 6ng CNHIDHVTSQLPYALTVAAAAASGYLALGLTKSALLGFGTTGIVLAVLIFLLKDKKRADV 50 6 
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The complete length ORF26ng nucleotide sequence <SEQ ID 695> is: 



101 

151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 



ATGCAGCTGA 
TTTGGCACTG 
GCATCGGTAT 
GACGGTCTGA 
CGGCGATTGG 
TGGGCATTTT 
GCCGACTGGG 
GACCGCCTGC 
TCGCCGTCGG 
CGCGCCAAAC 
GCTGATGCCC 
GATTGCTCGT 
GTCGCCATGA 
ATTCGTCGTC 
AACAGGCTGC 
ACCAAAGGTC 
CTCAACGGTT 
TCAGCATTTT 
TTCGGCGGCA 
GATTAAAACC 
TGTTCGGCGC 
GTCGGCGAAA 
CATCCATCCC 
TGGCGTTTGC 
ATTGCCGCCG 
TATGTCCGCA 
TCTCCGACAC 
GACCACGTTA 
CGCATCGGGC 
TTGGCACGAC 
AAAAAACGCG 



TTGACTATTC 
GCACTTGCCG 
TTTGGTCGGC 
CACACCT GAA 
TCGCTGGGCA 
CACTTCACTG 
CAAAACGGCA 
CTCGTGTTCG 
TGCGATTGCC 
TCGCCTACAT 
GTTTCAAGCT 
TACCTACAAA 
GCCTGATGAA 
GCATGGTTCT 
GTTGAACGAA 
GTGTTTACGC 
TCCGCCATGA 
GGGGGCATTT 
CTTGCGGCGT 
GCCGATTATC 
AATCGCCATT 
TGCACACGGG 
GGCTTCCTGC 
CACAGGCACA 
CCATGGCGGT 
GTAATGGCGG 
GACCATCCTG 
CCTCGCAACT 
TACCTCGCAT 
CGGTATTGTA 
CCGACGTTTG 



ACATTCATTT 
TCATTACCCG 
GTTGCCTTTT 
AGACATGGTC 
AACCAAAAAT 
CT GACCT ACT 
CATTAAAAAC 
TAACCTTTAT 
CGCCCCGTTA 
CCTCGACTCC 
GGGGCGCGTC 
ATTACCGAAT 
CTATTACGCG 
CCTTCGACAT 
gcccaggacg 

ATT GAT TAT T 
TCTACACCGG 
GAAAATACCG 
GCTTGCCGTC 
CCAAAGCCGT 
TTAATCCTCG 
CGACTACCTC 
CCGTCATCCT 
AGCTGGGGGA 
CAAAGTCGAA 
GGGCGGTATG 
TCGTCCACCG 
GCCTTATGCC 
TGGGTCTGAC 
TTGGCGGTGC 



TTCTCGGTTG 
CCGCGTACTG 
TGGTCGGCGG 
GTCGGCTTGG 
CTTGGTTTTC 
CCGGCAGCAA 
CGGTGCGGCG 
CGACGACTAT 
CCGACAAGTT 
ACTGCCTCGC 
GATTATCGCC 
ACACGCCGAT 
CTGTTTGCCC 
CGGCTCGAtg 
aaaccgccgc 
CCCGTTTTGG 
CGCGCAGGCA 
ACGTAAACAC 
GTCCTCTGCA 
GTGGCAGGGT 
CCTGGCTCAT 
TCCACGCTGG 
CTTCCTGCTC 
CGTTCGGCAT 
CCCGCGCTGA 
CGGCGACCAC 
GCGCGCGCTG 
CTGACGGTTG 
AAAATCCGCG 
TGATTTTTCT 



TGCCACCCTT 
CTGTCTTTAG 
CAACCCCGTC 
CTTGGGCAGA 
CTGATACTTT 
TCAGGCGTTT 
CGAAAATGCT 
TTCCACAGCC 
TAAAGTTTCC 
CCATGTGCGT 
ACGCTTGCCG 
GGGGACGTTT 
TGATTATGGT 
gCGCGTTTCG 
tTCAGACgCT 
CCTTAATCGC 
AGCGAAACCT 
TTCGCTGGTA 
CGTTCGGCAC 
GCGAAATCCA 
CAGTACGGTT 
TTGCGGGCAA 
GCCAGCGTGA 
TATGCTGCCG 
TTAtCCCGTG 
TGTTCGCCCA 
CAACCACATC 
CCGCCGCCGC 
CTGTTGGGCT 
GTTGAAAGAT 



This encodes a protein having amino acid sequence <SEQ ID 696>: 

1 MQLIDYSHSF FSWPPFLAL A LAV I TRR VL LSLGIGILVG VAFLV GGNPV 

51 DGLTHLKDMV VGLAWADGDW SLGKP KILVF LILLGIFTSL LTY SGSNQAF 

101 ADWAKRHIKN R CGAKMLTAC LVFVTFID DY FHS LAVGAI A RPVTDKFKVS 

151 RAKLAYILDS TASPMCVLMP VSSWGASIIA TLAGLLV TYK ITEYTPMGTF 

201 VAMSLMNYYA LFALIMVFW AWFSFDI GSM ARFEQAALNE AQDETAASDA 

251 TKGRVY ALI I PVLALIASTV SAMI YTGAQA SETFSILGAF ENTDVNTSLV 

301 FGGTCGVLAV VLCTF GTIKT ADYPKAVWQG AKSM FGAIAI LILAWLISTV 

351 VGEMHTGDYL STLVAGNIHP GFLPVILFLL ASVMAFA TGT SW GTFGIMLP 

401 IAAAMAVKV E P ALIIPCMSA VMAGAVCG DH CSPISDTTIL SSTGARCNHI 

451 DHVTSQLPY A LTVAAAAASG YLALGL TKSA LLGFGTTGIV LAVLIFL LKD 

501 KKRADV* 

ORF26ng and ORF26-1 show 98.4% identity in 505 aa overlap: 



orf26-l.pep 



orf26-l.pep 



orf26-l .pep 



MQLIDYSHSFFSWPPFLALALAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 
I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I II I 1 I I I I I I I I I I I 
MQLIDYSHSFFSWPPFLALALAVITRRVLLSLGIGILVGVAFLVGGNPVDGLTHLKDMV 



10 



20 



30 



50 



60 



70 80 90 100 110 120 

VGLAWSDGDWSLGKPKILVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRRGAKMLTAC 

I I II I : I I II I I I I I I I II I I 1 I I I I I I I I I I II I II I I I I M I I I I I I I I I I I I II I I 
VGLAWADGDWSLGKPKILVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRCGAKMLTAC 

70 80 90 100 110 120 

130 140 150 160 170 180 

LVFVTFIDDYFHSLAVGAIARPVTDKFKVSRTKLAYILDSTAAPMCVLMPVSSWGASIIA 

II I I 1 I M I I I I I I I I I II I I I I I I I I I I I I : II I I I I I II I : I M I I I I I I I I I I I I I I 
LVFVTFIDDYFHSLAVGAIARPVTDKFKVSRAKLAYILDSTASPMCVLMPVSSWGASIIA 

130 140 150 160 170 180 



190 



200 



210 



220 



230 



240 
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orf 2 6-1 pep tlagllvtykiteytpmgtfvamslmnyyalfalimvfwawfsfdigsmarfeqaalne 
I I I l l I I I 1 I I I I I I I I I I I I I I I I M I I I I II I 1 1 I I I I I I I I I 1 I I I I I I I I M 1 I I I 
orf 2 6nq TiAGLLVTYKITEYTPMGTFVAMSLMNYYALFALIMVFWAWFSFDIGSMARFEQAALNE 
190 200 210 220 230 240 

250 260 270 280 290 300 

orf 2 6-1 pep AHDETAVSDATKGRVYALIIPVLALIASTVSAMIYTGAQASETFSILGAFENTDVNTSLV 
I : I 1 I I : I I ! I I I I I I I I I I I I M I I I I I I I I I I I I I I I M I I I I M I I I I I I 1 I I I I I I 
orf26nq AQDETAASDATKGRVYALIIPVLALIASTVSAMIYTGAQASETFSILGAFENTDVNTSLV 

250 260 270 280 290 300 

310 320 330 340 350 360 

o r f 2 6 - 1 pep FGGTCGVLAVVLCTLGT IKTADYPKAVWQGAKSMFGAI AI LI LAWL I ST WGEMHTGDYL 

I I I I M I I I I I I I I : I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I 

orf26ng FGGTCGVLAWLCTFGTIKTADYPKAVWQGAKSMFGAIAILILAWLISTVVGEMHTGDYL 

310 320 330 340 350 360 

370 380 390 400 410 420 

orf 2 6-1 pep STLVAGNIHPGFLPVILFLLASVMAFATGTSWGT FGIMLPIAAAMAVKVEPALI IPCMSA 

I I I I I II I I I I I I I I I I I I M I ! I I I I I I I I I i I I I I I I I I I I M I I I I I I I I I 

orf2 6ng STLVAGNIHPGFLPVILFLLASVMAFATGTSWGTFGIMLPIAAAMAVKVEPALIIPCMSA 
370 380 390 400 410 420 

430 440 450 460 470 480 

orf 26-1 .pep VMAGAVCGDHCSPISDTTILSSTGARCNHIDHVTSQLPYALTVAAAAASGYLALGLTKSA 
I I I I II I I I I I I I I I I I I I I I I I I I I M I I I I I I M I I i M I II I I I I I I I I I II I I I II 
orf26ng VMAGAVCGDHCSPISDTTILSSTGARCNHIDHVTSQLPYALTVAAAAASGYLALGLTKSA 

430 440 450 460 470 480 



30 490 500 

orf 26-1 . pep LLGFGTTGIVLAVLIFLLKDKKRANAX 

I I II I I I I I I I I I I I I I I I I I I I I : : 
orf2 6ng LLGFGTTGIVLAVLIFLLKDKKRADVX 
490 500 

35 In addition, ORF26 ng shows significant homology to a hypothetical H.influenzae protein: 



spl P4 42 63 [YF8 6_HAEIN HYPOTHETICAL PROTEIN HI1586 >gi | 1 07 48 50 | pir | | C64 0 37 

hypothetical 

protein HI1586 - Haemophilus influenzae (strain Rd KW20) >gi 11574427 (U32832) H. 
influenzae predicted coding region HI1586 [Haemophilus influenzae] Length = 519 
40 Score = 538 bits (1370), Expect = e-152 

Identities = 280/507 (55%), Positives = 346/507 (68%), Gaps = 7/507 (1%) 





Query: 


1 


MQLIDYSHSFFSWPPFLALALAVITRRXXXXXXXXXXXXXAFLVGGNPVDGLTHLKDMV 


60 








M+LID+S S +S+VP LA+ LA+ TRR L +L V 




45 


Sbjct: 


14 


MELIDFSSSVWSIVPALLAIILAIATRRVLVSLSAGIIIGSLMLSDWQIGSAFNYLVKNV 


73 




Query: 


61 


VGLAWADGDWSLGKPKILVFLILLGIFTSLLTYSGSNQAFADWAKRHIKNRCGAKMLTAC 


120 






V L +ADG+ + I++FL+LLG+ T+LLT SGSN+AFA+WA+ IK R GAK+L A 






Sbjct: 


74 


VSLVYADGEIN-SNMNIVLFLLLLGVLTALLTVSGSNRAFAEWAQSRIKGRRGAKLLAAS 


132 


50 


Query: 


121 


LVFVTFIDDYFHSLAVGAIARPVTDKFKVSRAKLAYILDSTAS PMCVLMPVSSWGASIIA 


180 








LVFVTFIDDYFHSLAVGAIARPVTD+FKVSRAKLAYILDSTA+PMCV+MPVSSWGA II 






Sbjct: 


133 


LVFVTFIDDYFHSLAVGAIARPVTDRFKVSRAKLAYILDSTAAPMCVMMPVSSWGAYIIT 


192 


55 


Query: 


181 


TLAGLLVTYKITEYTPMGTFVAMSLMNYYALFALIMVFWAWFSFDIGSMARFEQAALNE 


240 






+ GLL TY ITEYTP+G FVAMS MN+YA+F++IMVF VA+FSFDI SM R E+ AL 






Sbjct: 


193 


LIGGLLATYS ITEYTP IGAFVAMS SMNFYAI FS IIMVFFVAYFSFDIASMVRHEKLALKN 


252 




Query: 


241 


AQ DE T AAS DAT KGRV Y AL 1 1 PVL AL IAS TVS AM I YT G AQ A SETFSILGAFENTDVN 


296 


60 






+D+ TKG+V LI+P+L LI +TVS MIYTGA+A + FS+LG FENT V 






Sbjct: 


253 


TEDQLEEETGTKGQVRNLILPILVLIIATVSMMIYTGAEALAADGKVFSVLGTFENTVVG 


312 






297 


TSLVFGGTCGVL — AVVLCTFGTIKTADYPKAWQGAKSMFGXXXXXXXXXXXSTWGEM 


354 








TSLV GG C ++ +++ + +Y ++ G KSM G + +VG+M 




65 


Sbjct: 


313 


TSLVVGGFCSIIISTLLIILDRQVSVPEYVRSWIVGIKSMSGAIAILFFAWTINKIVGDM 


372 



Query: 



355 



HTGDYLSTLVAGNIHPGFLPVILFLLASVMAFATGTSWGT FGIMLPIAAAMAVKVEPALI 414 
TG YLS+LV+GNI FLPVILF+L + MAF+TGTSWGTFGIMLPIAAAMA P L+ 
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Sbjct: 373 QTGKYLSSLVSGNIPMQFLPVILFVLGAAMAFSTGTSWGTFGIMLPIAAAMAANAAPELL 432 

Query: 415 IPCMSAVMAGAVCGDHCSPISDTTILSSTGARCNHIDHVTSQXXXXXXXXXXXXXXXXXX 47 4 

+PC+SAVMAGAVCGDHCSP+SDTTILSSTGA+CNHIDHVT+Q 
Sbjct: 433 LPCLSAVMAGAVCGDHCSPVSDTTILSSTGAKCNHIDHVTTQLPYAATVATATSIGYIW 492 

Query: 475 XXXKSALLGFGTTGIVLAVLIFLLKDK 501 

S L GF T + L V+IF +K + 
Sbjct: 493 G FT YSGLAG FAAT AV S L I V 1 1 FAVKKR 519 

Based on this analysis, it is predicted that these proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 83 

The following partial DNA sequence was identified in N. meningitidis <SEQ ED 697>: 

1 . . AAGCAATGGT ATGCCGACGN . AGTATCAAG ACGGAAATGG TTATGGTCAA 

51 CGATGAGCCT GCCAAAATTC TGACTTGGGA TGAAAGCGGC CGATTACTCT 

101 CGGAACTGTC TATCCGCCAC CATCAACGCA ACGGGGTGGT TTTGGAGTGG 

151 TATGAAGATG GTTCTAAAAA GAGCGAAGT. GTTTATCAGG ATGACAAGTT 

201 GGT CAGGAAA ACCCAGTGGG ATAAGGATGG TTATTTAATC GAACCCTGA 

This corresponds to the amino acid sequence <SEQ ID 698; ORF27>: 

1 . . KQWYADXSIK TEMVMVNDEP AKILTWDESG RLLSELSIRH HQRNGWLEW 
51 YEDGSKKSEX VYQDDKLVRK TQWDKDGYLI EP* 

Further work revealed the complete nucleotide sequence <SEQ ED 699>: 

1 ATGAAAAAAT TATCTCGGAT TGTATTTTCA ACTGTCCTGT TGGGTTTTTC 

51 GGCCGCTTTG CCGGCGCAGA CCTATTCTGT TTATTTTAAT CAGAACGGAA 

101 AGCTGACGGC GACGATGTCT TCTGCCGCTT ATATCAGGCA ATATAGTGTG 

151 GTGGCGGGTA TTGCGCACGC GCAGGATTTT TATTATCCGT CGATGAAGAA 

201 ATATTCTGAA CCTTATATCG TTGCTTCAAC GCAAATCAAA TCTTTTGTGC 

251 CTACCCTGCA AAACGGTATG TTGATTTTGT GGCATTTTAA TGGTCAGAAA 

301 AAAATGGCGG GGGGCTTCAG CAAGGGTAAG CCGGACGGGG AGTGGGTCAA 

351 CTGGTATCCG AACGGTAAAA AATCTGCCGT TATGCCTTAT AAAAATGGCT 

401 TGAGTGAGGG TACGGGATAC CGCTATTACC GTAACGGCGG CAAGGAAAGC 

451 GAAAT CCAGT TTAAGCAAAA TAAGGCAAAC GGCGTATGGA AGCAATGGTA 

501 TGCCGACGGC AGTATCAAGA CGGAAATGGT TAT GGT C AAC GATGAGCCTG 

551 CCAAAATTCT GACTTGGGAT GAAAGCGGCC GAT TACT CTC GGAACTGTCT 

601 ATCCGCCACC AT CAACGCAA CGGGGTGGTT TTGGAGTGGT ATGAAGATGG 

651 TTCTAAAAAG AGCGAAGCTG TTTATCAGGA TGACAAGTTG GTCAGGAAAA 

7 01 CCCAGTGGGA TAAGGATGGT TATTTAATCG AACCCTGA 

This corresponds to the amino acid sequence <SEQ ID 700; ORF27-l>: 

1 MKKLSRIVFS TVLLGFSAAL PAQTYSVYFN QNGKLTATMS SAAYIRQYSV 

51 VAGIAHA QDF YYPSMKKYSE PYIVASTQIK SFVPTLQNGM LILWHFNGQK 

101 KMAGGFSKGK PDGEWVNWYP NGKKSAVMPY KNGLSEGTGY RYYRNGGKES 

151 EIQFKQNKAN GVWXQWYADG SIKTEMVMVN DEPAKILTWD ESGRLLSELS 

2 01 IRHHQRNGW LEWYEDGSKK SEAVYQDDKL VRKTQWDKDG YLIEP* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted QRF from N. meningitidis (strain A) 

ORF27 shows 91.5% identity over a 82aa overlap with an ORF (ORF27a) from strain A of N. 
meningitidis: 

10 20 30 

orf27 .pep KQWYADXSIKTEMVMVNDEPAKILTWDESG 
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IMIII : I I I I I I I I I I I I I I I I I I I I I I 

or f 2 7a lsegtgxryyrnggkeseiqfkqnkangwkqwyadgniktemvmvndepakiltwdesg 
140 150 160 170 180 190 

40 50 60 70 80 

orf27 pep RLLSELSIRHHQRNGWLEWYEDGSKKSEXVYQDDKLVRKTQWDKDGYLIEPX 
||||||||:|| I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I i I I I I II II I 
orf27a RLLSELSIHHHXRNGWLEWYEDGSKKXEAVYQDDKLVRKTQWDXDGYLIEPX 
200 210 220 230 240 

The complete length ORP27a nucleotide sequence <SEQ ID 701 > is: 

1 ATGAAAAAAT TATCT CGGAT TGTATTTTCA ACTGTCCTGT TGGGTTTTTC 

51 GGCCGCTTTG CCGGCGCAGA NCTATTCTGT TTATTTTAAT CAGAACGGGA 

101 AACTGACGGC GACGNTGTCT TCTGCCGCNT AT AT C AGGC A ATATAGTGTG 

151 GCGGAGGGTA TTGCGCACGC GCAGGANTTT TANTATCCGT CGATGAAGAA 

201 ATATTCCGAA CCTTATATCG TTGCTTCAAC GCAAATCAAA TCTTTTGTGC 

251 CTACCCTGCA AAACGGTATG TTGATTTTGT GGCATTTTAA NGGTCAGAAA 

301 AAAATGGCNG GGGGCTTCAG CAAGGGTAAG CCGGACGGGG AGTGGGTCAA 

351 CTGGTATCCG AACGGTAAAA AATCTGCCGT TATGCCTTAT AAAAATGGTT 

401 TGAGTGAAGG TACGGGGTNN CGCTATTACC GTAACGGCGG CAAGGAAAGC 

451 GAAATCCAGT TTAAACAGAA TAAGGCAAAC GGCGTATGGA AGCAATGGTA 

501 TGCCGACGGC AATATCAAAA CGGAAATGGT TATGGTCAAT GATGAGCCTG 

551 CCAAAATTCT GACATGGGAT GAAAGCGGTC GAT TACT CTC GGAACTGTCT 

601 ATCCATCATC ATNAACGTAA TGGAGTAGTC TTAGAGTGGT ATGAAGATGG 

651 TTCTAAAAAG ANTGAAGCTG TTTATCAGGA TGATAAGTTG GTCAGGAAAA 

701 CCCAGTGGGA TAANGATGGT TATTTAATCG AACCCTGA 

This encodes a protein having amino acid sequence <SEQ ID 702>: 



1 MKKLSRIVFS TVLLGFSAAL PAQXYSVYFN QNGKLTATXS SAAYIRQYSV 

51 AEGIAHA QXF XYPSMKKYSE PYIVASTQIK SFVPTLQNGM LILWHFXGQK 

101 KMAGGFSKGK PDGEWVNWYP NGKKSAVMPY KNGLSEGTGX RYYRNGGKES 

151 EIQFKQNKAN GVWKQWYADG NIKTEMVMVN DEPAKILTWD ESGRLLSELS 

201 IHHHXRNGVV LEWYEDGSKK XEAVYQDDK1 VRKTQWDXDG YLIEP* 

ORF27a and ORF27-1 show 94.7% identity in 245 aa overlap: 



10 20 30 40 50 60 

or f 2 7a . pep MKKLSRIVFSTVLLGFSAALPAQXYSVYFNQNGKLTATXSSAAYIRQYSVAEGIAHAQXF 

[ M I II I I I I I I I I I I I I I I I I I : II I I I I I I I I II I I I I I I I I I I I I I : IIIIM I 

or f 2 7-1 MKKLSRIVFSTVLLGFSAALPAQTYSVYFNQNGKLTATMSSAAYIRQYSVVAGIAHAQDF 
10 20 30 40 50 60 



70 80 90 100 110 120 

or f 2 7a. pep XYPSMKKYSE PYIVASTQIKSFVPTLQNGMLILWHFXGQKKMAGGFSKGKPDGEWVNWYP 

I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 1 I I I I II I 
orf27-l YYPSMKKYSEPYIVASTQIKSFVPTLQNGMLILWHFNGQKKMAGGFSKGKPDGEWVNWYP 

70 80 90 100 110 120 



130 140 150 160 170 180 

orf 27a . pep NGKKSAVMPYKNGLSEGTGXRYYRNGGKESEIQFKQNKANGVWKQWYADGNIKTEMVMVN 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I 
orf 27-1 NGKKSAVMPYKNGLSEGTGYRYYRNGGKESEIQFKQNKANGVWKQWYADGSIKTE^^VMVN 

130 140 150 160 170 180 



190 200 210 220 230 240 

orf 27a . pep DEPAKILTWDESGRLLSELS IHHHXRNGWLEW YE DGSKKXEAVYQDDKL VRKTQWDXDG 

I I I I I I I I I II I I I I I I II I I : II I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II 
orf 27-1 DEPAKILTWDESGRLLSELS IRHHQRNGWLEWYEDGSKKSEAVYQDDKLVRKTQWDKDG 

190 200 210 220 230 240 



orf 27a. pep YLIEPX 
I I I I I I 

orf27-l YLIEPX 



CHIR-0160 (356.001) PATENT 

-422- 

Homology with a predicted ORF from Kgonorrhoeae 

ORF27 shows 96.3% identity over 82 aa overlap with a predicted ORF (ORF27ng) from 
N. gonorrhoeae: 

orf27 pep KQWYADXSIKTEMVMVNDEPAKILTWDESG 30 

i I I I I I I I I I I I I I I I I II II I I I I I I I I 
orf27ng LSEGTGYRYYRNGGKESEIQFKQNKANGVWKQWYADGSIKTEMVMVNDEPAKILTWDESG 193 

orf27 pep RLLSELSIRHHQRNGWLEWYEDGSKKSEXVYQDDKLVRKTQWDKDGYLIEP 82 

I | | | I I I [ I [ I : I I I II I I I I I II I I I II I I I II I I I I I I I I I I I I I I I I I 
orf27ng RLLSELSIRHHKRNGWLEWYEDGSKKSEAVYQDDKLVRKTQWDKDGYLIEP 245 

The complete length ORF27ng nucleotide sequence <SEQ ID 703> is: 

1 ATGAAGAAAT TATCTCGGAT TGTATTTTCA ATCGTACTGT TGGGTTTTTC 

51 GGCCGCTTTG CCGGCGCAGA CCTATTCTGT TTATTTTAAT CAGAACGGGA 

101 AACTGACGGC GACGATGTCT TCTGCCGCTT AT AT CAGGCA ATATAGTGTG 

151 GCGGCGGGTA TCGCACACGC GCAGGATTTT TATTATCCGT CGATGAAGAA 

201 ATATTCCGAA CCTTATATCG TTGCTTCAAC GCAAATCAAA TCTTTTGTGC 

251 CTACCCTGCA AAACGGTATG TTGATTTTGT GGCATTTTAA TGGTCAGAAA 

301 AAAATGGCGG GGGGCTTCAG CAAGGGTAAG CCGGACGGGG AATGGGTCAA 

351 CTGGTATCCG AACGGTAAAA AATCTGCGGT TATGCCTTAT AAAAATGGCT 

401 TGAGTGAGGG TACGGGATAC CGTTATTACC GTAACGGCGG CAAGGAAAGC 

451 GAAATCCAGT TTAAGCAAAA TAAGGCGAAC GGCGTATGGA AGCAATGGTA 

501 TGCCGATGGA AGTATCAAGA CGGAAATGGT TATGGTCAAC GATGAGCCTG 

551 CCAAAATTCT GACTTGGGAT GAAAGCGGCC GATTACTTTC GGAACTGTCT 

601 ATCCGCCACC ATAAACGCAA CGGGGTGGTT TTGGAGTGGT ATGAAGATGG 

651 TTCTAAAAAG AGCGAGGCTG TTTATCAGGA TGACAAGTTG GTCAGGAAAA 

701 CCCAATGGGA TAAGGATGGT TATTTAATCG AACCCTGA 

This encodes a protein having amino acid sequence <SEQ ID 704>: 

1 MKKLSRIVFS IVLLGFSAAL PA QTYSVYFN QNGKLTATMS SAAYIRQYSV 

51 AAGIAHAQDF YYPSMKKYSE PYIVASTQIK SFVPTLQNGM LILWHFNGQK 

101 KMAGGFSKGK PDGEWVNWYP NGKKSAVMPY KNGLSEGTGY RYYRNGGKES 

151 EIQFKQNKAN GVWKQWYADG SIKTEMVMVN DEPAKILTWD ESGRLLSELS 

201 IRHHKRNGW LEWYEDGSKK SEAVYQDDKL VRKTQWDKDG YLIEP* 

ORF27ng and ORF27-1 show 98.8% identity in 245 aa overlap: 

10 20 30 40 50 60 

orf 27-1. pep MKKLSRIVFS TVLLGFSAALPAQT YSVY FN QNGKLTATMS SAAYIRQYSVVAGIAHAQDF 
I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I = I I I I I I I I I 
orf27ng MKKLSRIVFS I VLLGFSAALPAQTYSVY FN QNGKLTATMS SAAYIRQYSVAAGIAHAQDF 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 27-1. pep YYPSMKKYSE PYIVASTQIKSFVPTLQNGMLILWHFNGQKKMAGGFSKGKPDGEWVNWYP 
I I I II I I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I I I I II I I I I I I I I I I I II 
orf27ng YYPSMKKYSE PYIVASTQIKSFVPTLQNGMLILWHFNGQKKMAGGFSKGKPDGEWVNWYP 

70 80 90 100 110 120 

130 140 150 160 170 180 

or f 2 7-1 . pep NGKKSAVMPYKNGLSEGTGYRYYRNGGKESEIQFKQNKANGVWKQWYADGSIKTEMVMVN 
I I I I I I I I LI II I I I I I I I I I I I M I I I I II I I I I I I I I I I I I I I I I I I I I I II I II II I 
orf27ng NGKKSAVMPYKNGLSEGTGYRYYRNGGKESEIQFKQNKANGVWKQWYADGSIKTEMVMVN 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 27-1. pep DEPAKILTWDESGRLLSELSIRHHQRNGWLEWYEDGSKKSEAVYQDDKLVRKTQWDKDG 
I I I I I I I I I I I I I I I I I I I I I I II : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf27ng DEPAKILTWDESGRLLSELSIRHHKRNGWLEWYEDGSKKSEAVYQDDKLVRKTQWDKDG 

190 200 210 220 230 240 

orf 27-1. pep YLIEPX 
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orf27ng YLIEPX 

Based on this analysis, including the putative leader sequence in the gonococcal protein, it was 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF27-1 (24.5kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
17A shows the results of affinity purification of the GST-fusion protein, and Figure 17B shows the 
results of expression of the His-fusion in E.coli. Purified GST-fusion protein was used to immunise 
mice, whose sera were used for ELISA, which gave a positive result, confirming that ORF27-1 is 
a surface-exposed protein and a useful immunogen. 

Example 84 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 705>: 

1 ATGAAATTTA CCAAGCACCC CGTCTGGGCA ATGGCGTTCC GCCCATTTTA 

51 TTCGCTGGCG GCTCTGTACG GCGCATTGTC CGTATTGCTG TGGGGTTTCG 

101 GCTACACGGG AACGCACkAG CTGTCCGGTT TCTATTGGCA CGCGCATGAg 

151 ATGATTTGGG GTTATGCCGG ACTGGTCGTC ATCGCCTTCC TGCTGACCGC 

201 CGTCGCCACT TGGACGGGGC AGCCGCCCAC GCGGGGCGGC GTaTCTGGTC 

251 GGCTTGACTA TCTTTTGGCT GGCTGCGCGG ATTGCCGCCT TTATCCCGGG 

301 TTGGGGTGCG TCGGCAAGCG GCATACTCGG TACGCTGTTT TTCTGGTACG 

351 GCGCGGTGTG CATGGCTTTG CCCGTTATCC GTTCGCAGAA TCAACGCAAC 

401 TATGTTgCCG TGTTCGCGCT GTTCGTCTTG GGCGGCACGC ATGCGGCGTT 

451 CCACGTCCAG CTGCACAACG GCAACCTAGG CGGACTCTTG AGCGGATTGC 

501 AGTCGGGCTT GGTGATG 

This corresponds to the amino acid sequence <SEQ ID 706; ORF47>: 

1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHX LSGFYWHAHE 

51 MIWGYAGLW IAFLLTAVAT WTGQPPTRGG VLVGLTIFWL AARIAAFIPG 

101 WGASASGILG TLFFWYGAVC MALPVIRSQN QRNYVAVFAL FVLGGTHAAF 

151 HVQLHNGNLG GLLSGLQSGL VM 

Further work revealed the complete nucleotide sequence <SEQ ID 707>: 

1 ATGAAATTTA CCAAGCACCC CGTCTGGGCA ATGGCGTTCC GCCCATTTTA 

51 TTCGCTGGCG GCTCTGTACG GCGCATTGTC CGTATTGCTG TGGGGTTTCG 

101 GCTACACGGG AACGCACGAG CTGTCCGGTT TCTATTGGCA CGCGCATGAG 

151 ATGATTTGGG GTTATGCCGG ACTGGTCGTC ATCGCCTTCC TGCTGACCGC 

2 01 CGTCGCCACT TGGACGGGGC AGCCGCCCAC GCGGGGCGGC GTTCTGGTCG 
251 GCTTGACTAT CTTTTGGCTG GCTGCGCGGA TTGCCGCCTT TATCCCGGGT 

3 01 TGGGGTGCGT CGGCAAGCGG CAT ACT CGGT ACGCTGTTTT TCTGGTACGG 
351 CGCGGTGTGC ATGGCTTTGC CCGTTATCCG TTCGCAGAAT CAACGCAACT 

4 01 ATGTTGCCGT GTTCGCGCTG TTCGTCTTGG GCGGCACGCA TGCGGCGTTC 
451 CACGTCCAGC TGCACAACGG CAACCTAGGC GGACTCTTGA GCGGATTGCA 
501 GTCGGGCTTG GTGATGGTGT CGGGTTTTAT CGGTCTGATT GGTACGCGGA 
551 TTATTTCGTT TTTTACGTCC AAACGCTTGA ATGTGCCGCA GATTCCCAGT 
601 CCGAAATGGG TGGCGCAGGC TTCGCTGTGG CTGCCCATGC TGACTGCCAT 
651 GCTGATGGCG CACGGTGTGT TGGCTTGGCT GTCTGCCGTT TTTGCCTTTG 
701 CGGCAGGTGT GATTTTTACC GTGCAGGTGT ACCGCTGGTG GTATAAACCC 

7 51 GTGTTGAAAG AGCCGATGCT GTGGATTCTG TTTGCCGGCT ATCTGTTTAC 

8 01 CGGATTGGGG CTGATTGCGG TCGGCGCGTC TTATTTCAAA CCCGCTTTCC 
851 TCAATCTGGG TGTGCATCTG ATCGGGGTCG GCGGTATCGG CGTGCTGACT 
901 TTGGGCATGA TGGCGCGTAC CGCGCTTGGT CATACGGGCA ATCCGATTTA 
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951 TCCGCCGCCC AAAGCCGTTC CCGTTGCGTT TTGGCTGATG ATGGCGGCAA 
1001 CCGCCGTCCG TATGGTTGCC GTATTTTCTT CCGGCACTGC CTACACGCAC 
1051 AGCATCCGCA CCTCTTCGGT TTTGTTTGCA CTCGCGCTTT TGGTGTATGC 
1101 GTGGAAGTAT ATTCCTTGGC TGATTCGTCC GCGTT CGGAC GGCAGGCCCG 
1151 GTTGA 

This corresponds to the amino acid sequence <SEQ ID 708; ORF47-l>: 

1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHE LSGFYWHAHE 

51 M IWGYAGLW IAFLLTAV AT WTGQPPTRGG V LVGLTIFWL AARIAAFI PG 

101 WGASAS GILG TLFFWYGAVC MAL PVIRSQN QRN YVAVFAL FVLGGTHAAF 

151 HVQLHNGNLG GLLSGLQS GL VMVSGFIGLI GTRII SFFTS KRLNVPQIPS 

201 PKW VAQASLW LPMLTAMLMA HGVLAW LSAV FAFAAGVI FT VQV YRWWYKP 

251 VLKEPMLW IL FAGYLFTGLG LIAVG ASYFK P AFLNLGVHL IGVGGIGVL T 

301 LGMMARTALG HTGNPIYPPP KAVP VAFWLM MAAT AVRMV A V FSSGTAYTH 

351 SIRTSSVLFA LALLVYAW KY IPWLIRPRSD GRPG* 

Computer analysis of this amino acid sequence predicts a leader peptide and also gave the 
following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF47 shows 99.4% identity over a 172aa overlap with an ORF (ORF47a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf 47 .pep MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHXLSGFYWHAHEM IWGYAGLW 

orf47a MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEM IWGYAGLW 
10 20 30 40 50 60 



70 80 90 100 110 120 

orf 47 . pep I AFL LTAV ATWT GQ PPTRGGV LVGLT I FWLAAR I AAF I PGWGAS AS G I LGTLFFWYGAVC 

I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf47a I AFLLT AVA TWT GQ P PTRGGV LVGLT I FWLAAR I AAF I PGWGAS AS G I LGTLFFWYGAVC 

70 80 90 100 110 120 



130 140 150 160 170 

MAL PVIRSQN QRN YVAVFAL FVLGGTHAAF HVQLHNGNLGGLLSGLQS GLVM 
I I I I I I I I I I 1 I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I 1 I I I I I I I I I 
MALPVIRSQN QRN YVAVFAL FVLGGTHAAF HVQLHNGNLGGLLSGLQS GLVMVSGFIGLI 

130 140 150 160 170 180 



orf 4 7a GTRI I SFFTSKRLNVPQI PSPKWVAQASLWLPMLTAMLMAHGVMPWLSAAFAFAAGVI F 

190 200 210 220 230 24 

The complete length ORF47a nucleotide sequence <SEQ ID 709> is: 



1 ATGAAATTTA CCAAGCACCC CGTTTGGGCA ATGGCGTTCC GCCCGTTTTA 

51 TTCACTGGCG GCTCTGTACG GCGCATTGTC CGTATTGCTG TGGGGTTTCG 

101 GCTACACGGG AACGCACGAG CTGTCCGGTT TCTATTGGCA CGCGCATGAG 

151 ATGATTTGGG GTTATGCCGG ACTGGTCGTC ATCGCCTTCC TGCTGACCGC 

201 CGTCGCCACT TGGACGGGGC AGCCGCCCAC GCGGGGCGGC GTTCTGGTCG 

251 GCTTGACTAT CTTTTGGCTG GCTGCGCGGA TTGCCGCCTT TATCCCGGGT 

3 01 TGGGGTGCGT CGGCAAGCGG CAT ACT CGGT ACGCTGTTTT TCTGGTACGG 
351 CGCGGTGTGC ATGGCTTTGC CCGTTATCCG TTCGCAGAAT CAACGCAATT 

4 01 ATGTTGCCGT GTTCGCGCTG TTCGTCTTGG GCGGTACGCA CGCGGCGTTC 
451 CACGTCCAGC TGCACAACGG CAACCTAGGC GGACTCTTGA GCGGATTGCA 
501 GTCGGGCTTG GTGATGGTGT CGGGTTTTAT CGGTCTGATT GGTACGCGGA 
551 TTATTTCGTT TTTTACGTCC AAACGGTTGA ATGTGCCGCA GATTCCCAGT 
601 CCGAAATGGG TGGCGCAGGC TTCGCTGTGG CTGCCCATGC TGACCGCCAT 
651 GCTGATGGCG CACGGCGTGA TGCCTTGGCT GTCGGCGGCT TTCGCGTTTG 
7 01 CGGCAGGTGT GATTTTTACC GTGCAGGTGT ACCGCTGGTG GTATAAGCCT 
7 51 GTGTTGAAAG AGCCGATGCT GTGGATTCTG TTTGCCGGCT ATCTGTTTAC 
801 CGGATTGGGG CTGATTGCGG TCGGCGCGTC TTATTTCAAA CCCGCTTTCC 
851 TCAATCTGGG TGTGCATCTG ATCGGGGTCG GCGGTATCGG CGTGCTGACT 
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901 TTGGGCATGA TGGCGCGTAC CGCGCTCGGT CATACGGGCA ATCCGATTTA 

951 TCCGCCGCCC AAAGCCGTTC CCGTTGCGTT TTGGCTGATG ATGGCGGCAA 

1001 CCGCCGTCCG TATGGTTGCC GTATTTTCTT CCGGCACTGC CTACACGCAC 

1051 AGCATACGCA CCTCTTCGGT TTTGTTTGCA CTCGCGCTTT TGGTGTATGC 

5 1101 GTGGAAGTAT ATTCCTTGGC TGATTCGTCC GCGTTCGGAC GGCAGGCCCG 

1151 GTTGA 

This encodes a protein having amino acid sequence <SEQ ID 710>: 

1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHE LSGFYWHAHE 

51 M IWGYAGLW IAFLLTAVA T WTGQPPTRGG V LVGLT I FWL AARIAAFI PG 

10 101 WGASAS GILG TLFFWYGAVC MAL PVIRSQN QRN YVAVFAL FVLGGTHAAF 

151 HVQLHNGNLG GLLSGLQS GL VMVSGFIGLI GTRII SFFTS KRLNVPQIPS 

201 PKW VAQASLW LPMLTAMLMA HGVMPW LSAA FAFAAGVIFT VQV YRWWYKP 

2 51 VLKEPMLW IL FAGYLFTGLG LIAVGA SYFK P AFLNLGVHL IGVGGIGVL T 

301 LGMMARTALG HTGNPIYPPP KAVP VAFWLM MAATAVRMVA V FSSGTAYTH 

15 351 SIRTSSVLFA LALLVYA WKY IPWLIRPRSD GRPG* 

ORF47a and ORF47-1 show 99.2% identity in 384 aa overlap: 

10 20 30 40 50 60 

orf 47a . pep MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLVV 
I I I I I I I I I ] M I I I I I I I I I I I I I I I I I I 1 I M I I I I I I I I I I I I I I I I I I I I I II I I I 
20 or f 4 7 - 1 MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLVV 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 47a . pep IAFLLTAVATWTGQPPTRGGVLVGLTIFWLAARIAAFIPGWGASASGILGTLFFWYGAVC 
25 I I II I I I I I I I I M I I I I M I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 

orf 47-1 IAFLLTAVATWTGQPPTRGGVLVGLTIFWLAARIAAFIPGWGASASGILGTLFFWYGAVC 

70 80 90 100 110 120 

130 140 150 160 170 180 

30 orf 47a . pep MALPVIRSQNQRNYVAVFALFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVSGFIGLI 

I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I M II II I I I I I I I I I I II I I I I I 

orf 4 7-1 MALPVIRSQN QRNYVAVFAL FVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVSGFIGLI 

130 140 150 160 170 180 

35 190 200 210 220 230 240 

orf 47a . pep GTRIISFFTSKRLNVPQIPSPKWVAQASLWLPMLTAMLMAHGVMPWLSAAFAFAAGVIFT 
I I I I I I I I I I I I I I I I II I I II I I I I I I I I I I I I I I II I I I II : I I II = I I I I I I I II I 
orf 4 7-1 GTRI ISFFTSKRLNVPQI PSPKWVAQASLWLPMLTAMLMAHGVLAWLSAVFAFAAGVIFT 

190 200 210 220 230 240 

40 

250 260 270 280 290 300 

orf 4 7a . pep VQVYRWWYKPVLKEPMLWILFAGYLFTGLGLIAVGASYFKPAFLNLGVHLIGVGGIGVLT 
I I I I I I I I I I I II II I I 1 I ! I I I I II I I I I I I II I I II I II I I I I I I I II II I I I I I I I I 
o r f 4 7 - 1 VQVYRWWYKP VLKE PMLWIL FAGYL FTGLGL I AVGAS YFKPAFLNLGVHL I GVGGI GVLT 

45 250 260 270 280 290 300 

310 320 330 340 350 360 

orf 47a . pep LGMMARTALGHTGN PI YP PPKAVPVAFWLMMAATAVRMVAVFS SGTAYTHS IRT S S VLFA 

I I I I II II II I II I I I I I I I I I I II I I I I I I I I I I I I II II II 11 II II II I I II I I I I I 
50 orf47-l LGMMARTALGHTGNPIYP PPKAVPVAFWLMMAATAVRMVAVFS SGTAYTHS IRT SS VLFA 

310 320 330 340 350 360 

370 380 
orf 4 7a . pep LALLVYAWKYIPWLIRPRSDGRPGX 
55 I I I I I I II I I II I I I I I I I I I! I I I 

orf 4 7-1 LALLVYAWKYIPWLIRPRSDGRPGX 

370 380 

Homology with a predicted ORF from N.gonorrhoeae 
60 ORF47 shows 97.1% identity over 172 aa overlap with a predicted ORF (ORF47ng) from 
N. gonorrhoeae : 

ORF4 7 MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLVV 60 
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M M M | | I I I I I I I I I I I I I I I I II i I I I I I I I I M I i I I I I I I I I I M I I M I M I I I 

MKFTKHPWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLW 60 

IAFLLTAVATWTGQPPTRGGVLVGLTIFWLAARIAAFIPGWGASASGILGTLFFWYGAVC 12 0 
| | | | | | | | I I I I I I II I I I I II I I I I M I I I I I I I II I I I I I : I I I I I I I I I I I I I I M 
IAFLLTAVATWTGQPPTRGGVLVGLTAFWLAARIAAFIPGWGAAASGILGTLFFWYGAVC 120 

MALPVIRSQNQRNYVAVFALFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVM 172 
| | | | | | | | | I : I II I I I I I : I II I II II I I I I I I II I I I I I I I I I I I I 1 I 1 1 
MALPVIRSQNRRNYVAVFAIFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVWGFIGLI 180 

The ORF47ng nucleotide sequence <SEQ ID 71 1> is predicted to encode a protein comprising 
amino acid sequence <SEQ ID 712>: 

1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHE LSGFYWHAHE 

51 M IWGYAGLW IAFLLTAV AT WTGQPPTRGG VLVGLTAFWL AARIAAFI PG 

101 WGAAAS GILG TLFFWYGAVC MAL PVIRSQN RRN YVAVFAI FVLGGTHAAF 

151 HVQLHNGNLG GLLSGLQS GL VMVWGFIGLI GMKII SFFTS KRLKLPQIPS 

201 PKWVAHASLW LPMLNAILMA HRVMPW LSAA FPFAAGVIFT VQV YAGGITP 

251 IEETSCGSVA GICYRLGNSS G 

The predicted leader peptide and transmembrane domains are identical (except for an Ile/Ala 
substitution at residue 87 and an Leu/Ile substitution at position 140) to sequences in the 
meningococcal protein (see also Pseudomonas stutzeri orf396, accession number e246540): 



ORF4 7ng 

ORF47 

ORF47ng 

ORF47 

ORF47ng 



TM segments ir 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



ORF47ng 

Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



-3.08 
-1.91 
-1.44 



T r an sraenib r ane 
Transmembrane 
T r an smemb r ane 
T r an smemb r ane 
Transmembrane 
Transmembrane 



134 
107 
227 



150 
123 
243 



Further work revealed the complete gonococcal DNA sequence <SEQ LD 713>: 



1 


ATGAAATTTA 


CCAAACATCC 


51 


TTCACTGGCG 


GCACTGTACG 


101 


GCTACACGGG 


AACGCACGAG 


151 


ATGATTTGGG 


GTTATGCCGG 


201 


CGTCGCCACT 


TGGACGGGAC 


251 


GCTTGACCGC 


CTTTTGGCTG 


301 


TGGGGTGCGG 


CGGCAAGCGG 


351 


CGCGGTGTGC 


ATGGCTTTGC 


401 


ATGtcgCCGT 


ATTCGCAATA 


451 


CACGtccAgc 


tGCACAACGG 


501 


GTCGGGCCTG 


GTTATGGTGT 


551 


TTATTTCGTT 


TTTTACGTCC 


601 


CCGAAATGGG 


TGGCGCAGGC 


651 


ACTGATGGCG 


CACGGCGTGA 


701 


CGGCGGGCGT 


GATTTTTACC 


751 


GTATT GAAAG 


AACCGATGCT 


801 


CGGATTGGGG 


CTGATTGCGG 


851 


TCAATCTGGG 


CGTACATCTG 


901 


TTGGGCATGA 


TGGCGCGTAC 


951 


TCCGCCGCCC 


AAAGCCGTTC 


1001 


CCGCCGTCCG 


TATGGTTGCC 


1051 


AGCATCCGCA 


CGTCTTCGGT 


1101 


GTGGAAATAC 


ATTCCGTGGC 


1151 


GTTGA 





This encodes a protein having amino 



CGTCTGGGCA ATGGCGTTCC GCCCGTTTTA 
GCGCATTGTC CGTATTGCTG TGGGGTTTCG 
CTGTCCGGTT TCTATTGGCA CGCGCATGAG 
TCTCGTCGTC ATCGCCTTCC TGCTGACCGC 
AGCCGCCCAC GAGGGGCGGC GTTCTGGTCG 
GCTGCGCGGA TTGCCGCCTT TATCCCGGGT 
CATACTCGGT ACGCTGTTTT TCTGGTACGG 
CCGTTATCCG TtcgCAAAAC CGGCGCAACT 
TTTGTGCTGG GCGGTACGCA TGCGgcgTTC 
CAACCTAGGC GGACTCTTGA GCGGATT GC A 
CGGGCTTTAT CGGCCTGATT GGGAT GAGGA 
AAACGGTTGA ACGTGCCGCA GATTCCCAGT 
TTCGCTGTGG CTACCCATGC TGACCGCCAT 
TGCCTTGGCT GTCGGCGGCT TTCGCGTTTG 
GTACAGGTGT ACCGCTGGTG GTATAAACCC 
GTGGATTCTG TTTGCCGGCT ATCTGTTTAC 
TCGGCGCGTC TTATTTCAAA CCTGCCTTCC 
ATCGGGGTCG GCGGTATCGG CGTGCTGACT 
CGCGCTCGGT CATACGGGCA ATTCGATTTA 
CCGTTGCGTT TTGGCTGATG ATGGCGGCAA 
GTATTTTCTT CCGGCACTGC CTACACGCAC 
TTTGTTTGCA CTCGCGCTGC TGGTGTATGC 
TGATCCGTCC GCGTTCGGAC GGCAGGCCCG 



sequence <SEQ ED 714; ORF47ng-l>: 



1 MKFTKHPVWA MAFRPFYSLA ALYGALSVLL WGFGYTGTHE LSGFYWHAHE 
51 M IWGYAGLW IAFLLTAV AT WTGQPPTRGG VLVGLTAFWL AARIAAFI PG 
101 WGAAAS GILG TLFFWYGAVC MAL PVIRSQN RRN YVAVFAI FVLGGTHAAF 
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151 HVQLHNGNLG GLLSGLQS GL VMVSGFIGLI GMRII SFFTS KRLNVPQIPS 

2 01 PKW VAQASLW LPMLTAILMA HGVMPW LSAA FAFAAGV I FT VQV YRWWYKP 

2 51 VLKEPMLW IL FAGYLFTGLG LIAVGA SYFK P AFLNLGVHL IGVGGIGVL T 

301 LGMMARTALG HTGNSIYPPP KAVP VAFWLM MAATAVRMVA V FSSGTAYTH 

351 SIRTSSVLFA LALLVYA WKY IPWLIRPRSD GRPG* 

ORF47ng-l and ORF47-1 show 97.4% identity in 384 aa overlap: 



orf 47-1 .pep 



MKFTKHP VW AMAFRP FY S LAALYGAL SVLLWG FG YT GTHELSG FYWHAHEMI WGYAGL W 
| | | | | | ] I I I ! I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I ! I I I ! I I I I I I I I I M I I I I I 
MKFTKHPVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFYWHAHEMIWGYAGLW 



10 



20 



30 



40 



50 



60 



70 80 90 100 110 120 

IAFLLTAVATWTGQPPTRGGVLVGLTIFWLAARIAAFIPGWGASASGILGTLFFWYGAVC 
I I I I I I I I I M I I I I I I I I I I I I I I I I I I E I I I I I I I I I I I I : I I I I I ! M I 1 I I I I I I 
IAFLLTAVATWTGQPPTRGGVLVGLTAFWLAARIAAFIPGWGAAASGILGTLFFWYGAVC 

70 80 90 100 110 120 

130 140 150 160 170 180 

MALPVIRSQNQRNYVAVFALFVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVSGFIGLI 
I | | | I 1 I ] I I : I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I 11 
MALPVIRSQNRRNYVAVFAI FVLGGTHAAFHVQLHNGNLGGLLSGLQSGLVMVSGFIGLI 
130 140 150 160 170 180 

190 200 210 220 230 240 

GTRIISFFTSKRLNVPQIPSPKHVAQASLWLPMLTAMLMAHGVLAWLSAVFAFAAGVIFT 
I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I ! I I I I I : I I I I I I : I I I I : I I I I I I I I I I 
GMRI I S FFT SKRLNVPQI P S PKWVAQAS LWL PMLTAI LMAHGVMPWLS AAFAFAAGVI FT 

190 200 210 220 230 240 

250 260 270 280 290 300 

VQVYRWWYKPVLKEPMLWILFAGYLFTGLGLIAVGASYFKPAFLNLGVHL IGVGGIGVLT 
I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
VQVYRWWYKPVLKEPMLWILFAGYLFTGLGL I AVGASYFKPAFLNLGVHL IGVGGIGVLT 

250 260 270 280 290 300 

310 320 330 340 350 360 

LGMMARTALGHTGN PI YPPPKAVPVAFWLMMAATAVRMVAVFS S GT AYTHS IRT S S VLFA 
I I I I I I I I I I I I M I I I M I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I 
LGMMARTALGHTGN S I YPPPKAVPVAFWLMMAATAVRMVAVFS SGT AYTHS IRT S SVL FA 

310 320 330 340 350 360 



orf 47-1 .pep 
orf 47ng-l 



Furthermore, ORF47ng-l shows significant homology to an ORF from Pseudomonas stutzeri: 



50 



60 



Pseudomonas stutzeri] Length = 396 



gnl | PID | e246540 (Z73914) ORF396 protein 
Score = 155 bits (389), Expect = 5e-37 
Identities = 121/391 (30%), Positives = 169/391 (42%), Gaps = 21/391 (5%) 

PVWAMAFRPFYSLAALYGALSVLLWGFGYTGTHELSGFY WHAHEMIWGYAGLV 59 

P+W +AFRPF+ +LY L++ LW +TG GF WH HEM++G+A + 

PIWRLAFRPFFLAGSLYALLAIPLWVAAWTGLWP — GFQPTGGWLAWHRHEMLFGFAMAI 7 1 

VIAFLLTAVATWTGQPPTRGGVLVGLTAFWLAARIAAFIPGWGAAASGILGTLFFWYGAV 119 
V FLLTAV TWTGQ G LVGL A WLAAR+ ++ G AA L LF 
VAGFLLTAVQTWTGQTAPSGNRLVGLAAVWLAARL-GWLFGLPAAWLAPLDLLFLVALVW 130 



IG R+I FFT - 



Query: 


7 


Sbjct: 


14 


Query: 


60 


Sb j ct : 


72 




120 


Sbjct: 


131 


Query: 


180 
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Sbjct: 191 IGGRVIPFFTQRGLGKVDAVKPWWLDVALLVGTGVIALLHAFGVAMRPQPLLGLLFV-A 249 

Query: 235 AGVIFTVQVYRWWYKPVLKEPMLWILFAGYLFTGLGLIAVGASYF-KPAFXXXXXXXXXX 293 

GV +++ RW+ K + K +LW L L+ + + +F A 
Sbjct: 250 IGVGHLLRLMRWYDKGIWKVGLLWSLHVAMLWLVVAAFGLALWHFGLLAQSSPSLHALSV 309 

Query: 294 XXXXXXXXXMMARTALGHTGNSIYPPPKAVPVAFWLXXXXXXXXXXXXFSSGTAYTHSIR 353 

M+AR LGHTG +P+AFL FS + 

Sbjct: 310 GSMSGLILAMIARVTLGHTGRPLQLPAGIIG-AFVL FNLGTAARVFLSVAWPVGGLW 365 

Query: 354 T S SVLFAL ALLVYAWKY I P WL I RPRS DGRPG 384 

++V + LA +Y W+Y P L+ R DG PG 
Sbjct: 366 LAAVCWTLAFALYVWRYAPMLVAARVDGHPG 396 

Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 85 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 71 5>: 

1 . . ATGCCGTCTG AAGGTTCAGA CGGCmTCGGT GyCGGGGAAy CAGAAGyGGT 

51 AGCGCATGCC CAATGAGACT TCGTGGGTTT TGAAGCGGGT GTTTTCCAAG 

101 CGTCCCCAGT TGTGGTAACG GTATCCGGTG TCyAArGTCA GCTTGGGyGT 

151 GATGTCGAAa CCGACACCGG CGATGACACC AAGACCyAmG CTGCTGATrC 

201 TGTkGCTTTC GTGATAGGsA GGTTTGyTGG kmksAsyTTG TAyrATwkkG 

251 CCTssCwsTG kAGmGCCkTk CkyTGGTkkA swGrwArTAG TCGTGGTTTy 

301 TkTTyyCACC GAATGAACyT GATGTTTAAC GTGTCCGTAG GCGACGCGCG 

351 CGCCGATATA GGGTTTGAAT TTATCGTTGA GTTTGAAATC GTAAATGGCG 

401 GACAAGCCGA GAGAAGAAAC GGCGTGGAAG CTGCCGTTTC CCTGATGTTT 

451 TGTTTGGGTT TCTTTGTAGT TGTTGTTTAT CTCTTCAGTA ACTTTTTTAG 

501 TAGAAGAATT ACTTTCTTTC CATTTTCTGT AACTGGCATA ATCTGCCGCT 

551 ATTCTCCAGC CGCCGAAATC . . 

This corresponds to the amino acid sequence <SEQ ID 716; ORF67>: 



1 . . MPSEGSDGXG XGEXEXVAHA QXDFVGFEAG VFQASPVWT VSGVXXQLGX 

51 DVETDTGDDT KTXAADXVAF VIGRFXGXXL YXXAXXXXAX XWXXXXSRGF 

101 XXHRMNLMFN VSVGDARADI GFEFIVEFEI VNGGQAERRN GVEAAVSLMF 

151 CLGFFWWY LFSNFFSRRI TFFPFSVTGI ICRYSPAAEI . . 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N.gonorrhoeae 

ORF67 shows 51.8% identity over 199 aa overlap with a predicted ORF (ORF67ng) from 
N.gonorrhoeae: 

orf67.pep MP S EG S DGXGXGEXEXVAHAQXD FVGFE AG 30 

or f 67ng TNFEIAVLSGMTVRVFYCARPAPVNGGRLKMPSEGSDGIGIGESEAVAHAQRGFVGFEAG 14 6 

90 100 110 120 130 140 



VFQASPWVTVSGVXXQLGXDVETDTGDDTKTXAADXVAFVIGRFXGXXLYXXAXXXXAX 90 
I I I I I I I I I : I : I I MM:: : : : I I I [ I : [ I I : : 

VFQASPWVAVAGVQGQAGRDVYAHARHRAEAQAAAAVAFLIGVFLRMSVRINRNCCVSI 20 6 

XWXXXXSRGFXXHRMNLMFNVSVGDARADIGFE FIVE FEIVNGGQAERRN GVEAAVSLMF 150 
: I : I : : : : I I I i I I I : I I I I I I : I I I I I i I i I I I I I I I I I ! ! I I I I 

TRVGGKSTCYFFSRIDAVSDVSVGDARTDIGFEFWEFEIWGGQAERRNGVECAVFLMF 266 



CHIR-0160 (356.001) 



-429- 



PATENT 



orf67 pep CLGFFW VVYLFSNFFSRRITFF-PFSVTGIICRYSPAAEI 190 

| || : : | : | : : I : I I I I i I I : I I I I : 

orf67ng RLLVFYVKLVAAKSFIILSFQLFYVHGIFIWPFPVTGIIRGDAPAAEWADRHPGVDGM 326 

The ORF67ng nucleotide sequence <SEQ ID 71 7> is predicted to encode a protein comprising 
amino acid sequence <SEQ ID 71 8>: 

1 MPSETVGSIV NVGVDESVGF sppfpsiqhf yrfhrihrir lfrppgpmql 
51 NRHSHGSGNL GRGVWATVLS DKFPCGQVRI PACAGMTNFE IAVLSGMTVR 
101 VFYCARPAPV NGGRLKMPSE GSDGIGIGES EAVAHAQRGF VGFEAGVFQA 
151 SPVWAVAGV QGQAGRDVYA HARHRAEAQ A AAAVAFLIGV FLRMSV RINR 
201 NCCVSITRVG GKSTCYFFSR IDAVSDVSVG DARTDIGFEF WEFEIVNGG 
251 QAERRNGVE C AVFLMFRLLV FYVKLVA AKS F IILSFQLFY VHGIFIW PF 
301 PVTGI IRGDA PAAEWADRH PGVDGMRTDV SEIIAYRAYF VFAWSGWFRI 
351 IVGNAFGGVG * 

Based on the presence of a several putative transmembrane domains in the gonococcal protein, it 
is predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 86 

The following partial DNA sequence was identified in N. meningitidis <SEQ ED 719> 



1 ATGTTTGCTT TTTTAGAAGC CTTTTTTGTC GAATACGGTT ATGCGGCTGT 

51 TTTTTTTGTA TTGGTCATCT GCGGTTTCGG CGTGCCGATT CCCGAGGATT 

101 TGACCTTGGT AACAGGCGGC GTGATTTCGG GTATGGGTTA TACCAATCCG 

151 CATATTATGT TTGCAGTCGG TATGCTCGGC GTATTGGTCG GGGACGGCAT 

201 CATGTTCGCC GCCGGACGAA TTTGGGGGCA GArArTCCTA rGGTTCArAC 

251 CTATTGCGsG CATCATGACG CCGrAACGTT ATGAGCAGGT TCAGGAAAAA 

301 TTCGACAAAT ACGGTAACTG GGTCTTATTT GTCGCCCGTT TCCTGCCCGG 

351 TTTGAGAACG GCCGTATTTG TTACAGCCGG TATCAGCCGC AAGGTTTCAT 

401 ACTTGCGTTT TAT CAT TAT G GATGGACTGG CCGCA. . . 

This corresponds to the amino acid sequence <SEQ ID 720; ORP78>: 



1 MFAFLEAFFV EYG YAAVFFV LVICGFGVPI PEDLTLVTGG VISGMGYTNP 
51 H IMFAVGMLG VLVGDGIM FA AGRIWGQXXL XFXPIAXIMT PXRYEQVQEK 
101 F DKYGNWVLF VARFLPGL RT AVFVTAGISR KVSYLRFIIM DGLAA. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 72 1>: 



1 ATGTTTGCTT TTTTAGAAGC CTTTTTTGTC GAATACGGTT ATGCGGCTGT 

51 TTTTTTTGTA TTGGTCATCT GCGGTTTCGG CGTGCCGATT CCCGAGGATT 

101 TGACCTTGGT AACAGGCGGC GTGATTTCGG GTATGGGTTA TACCAATCCG 

151 CATATTATGT TTGCAGTCGG TATGCTCGGC GTATTGGTCG GGGACGGCAT 

201 CATGTTCGCC GCCGGACGAA TTTGGGGGCA GAAAATCCTA AGGTTCAAAC 

251 CTATTGCGCG CATCATGACG CCGAAACGTT ATGAGCAGGT TCAGGAAAAA 

301 TTCGACAAAT ACGGTAACTG GGTCTTATTT GTCGCCCGTT TCCTGCCCGG 

351 TTTGAGAACG GCCGTATTTG TTACAGCCGG TATCAGCCGC AAGGTTTCAT 

401 ACTTGCGTTT TAT CAT TATG GATGGACTGG CCGCACTGAT TTCCGTCCCT 

451 ATTTGGATTT ATCTGGGCGA ATACGGTGCG CACAACATCG ATTGGCTGAT 

501 GGCGAAAATG CACAGCCTGC AATCGGGTAT TTTTGTTATC TTGGGTATAG 

551 GTGCGACCGT TGTCGCTTGG ATTTGGTGGA AAAAACGCCA ACGTATCCAG 

601 TTTTACCGCA GCAAATTGAA AGAAAAGCGG GCGCAACGCA AAGCCGCCAA 

651 GGCAGCCAAA AAAGCCGCGC AAAGCAAACA ATAA 

This corresponds to the amino acid sequence <SEQ ID 722; ORF78-l>: 



1 MFAFLEAFFV EYG YAAVFFV LVICGFGVPI PEDLTLVTGG VISGMGYTNP 

51 H IMFAVGMLG VLVGDGIM FA AGRIWGQKIL RFKPIARIMT PKRYEQVQEK 

101 FDKYGNW VLF VARFLPGLRT AVFV TAGISR KVSYLR FIIM DGLAALISVP 

151 IWI YLGEYGA HNIDWLMAKM HSLQ SGIFVI LGIGATWAW I WWKKRQRIQ 
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201 FYRSKLKEKR AQRKAAKAAK KAAQSKQ* 

Computer analysis of this amino acid sequence predicts several transmembrane domains, and a 
gave the following results: 

Homology with the dedA homologue of H. influenzae (accession number P45280) 



5 ORF78 and the dedA homologue show 58% aa identity in 144aa overlap 

Orf78 



FLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGM— GYTNPHIMFAVGMLGV 61 
FL FF EYGY AV FVL+ICGFGVPIPED+TLV+GGVI+G+ N H+M V M+GV 

DedA: 20 FLIGFFTEYGYWAVLFVLIICGFGVPIPEDITLVSGGVIAGLYPENVNSHLMLLVSMIGV 7 9 

10 Orf78: 62 LVGDGIMFAAGRIWGQXXLXFXPIAXIMTPXRYEQVQEKFDKYGNWVLFVARFLPGLRTA 121 

L GD M+ GRI+G L F PI I+T R V+EKF +YGN VLFVARFLPGLR 
DedA: 80 LAGDSCMYWLGRIYGTKILRFRPIRRIVTLQRLRMVREKFSQYGNRVLFVARFLPGLRAP 139 

Orf78: 122 VFVTAGISRKVSYLRFIIMDGLAA 145 
15 +++ +GI+R+VSY+RF+++D AA 

DedA: 14 0 IYMVSGITRRVSYVRFVLIDFCAA 163 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF78 shows 93.8% identity over a 145aa overlap with an ORF (ORF78a) from strain A of N. 
20 meningitidis: 



MFAFLEAFFVEYG YAAVFFVLVICGFGVPI PEDLTLVTGGVISGMGYTMPH IMFAVGMLG 
I I I : I I I I I I I I li I I I I I I I I I I I I I I I I I 11 II I I M I I I I I I I I I I I I I I I I I I I I I 
MFALLEAFFVEYG YAAVFFVLVICGFGVPI PEDLTLVTGGVISGMGYTNPH IMFAVGMLG 



20 



30 



40 



50 



60 



70 80 90 100 110 120 

VLVGDGIM FAAGRIWGQXXLXFXPIAXIMTPXRYEQVQEKFDKYGNW VLFVARFLPGLRT 

I I I I M M I M I I I I I I I I III I I II II I I I I I I I I I M 1 I I I I I I I I I I I I I 

VLVGDGIM FAAGRIWGQKILKFKPIARIMTPKRYAQVQEKFDKYGNW VLFVARFLPGLRT 
70 80 90 100 110 120 

130 140 

AVFV TAGI SRKVS YLR FI IMDGLAA 
! II I I I I I I I I I I I I II : I I I II I I 

AVFV TAGI SRKVSYLR FLIMDGLAALISVPVWI YLGEYGAHNIDWLMAKMHSLQ SGIFIA 
130 140 150 160 170 180 



The complete length ORF78a nucleotide sequence <SEQ ID 723> is: 



45 



101 

151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 



ATGTTTGCCC 
GTTTTTCGTT 
TGACCTTGGT 
CATATTATGT 
CATGTTCGCC 
CGATTGCGCG 
TTCGACAAAT 
TTTGCGGACT 
ATCTGCGCTT 
GTTTGGATTT 
GGCGAAAATG 
TGGCGGCGGC 
CTTTACCGCG 
GGCAGCGAAA 



TTTTGGAAGC 
TTGGTCATCT 
AACAGGCGGC 
TTGCAGTCGG 
GCCGGACGCA 
CATCATGACG 
ACGGCAACTG 
GCCGTTTTCG 
TCTGATTATG 
ACTTGGGCGA 
CACAGCCTGC 
GCTGGCGTGG 
CACAATTGAG 
AAAGCGGCAC 



CTTTTTTGTC 
GCGGTTTCGG 
GTGATTTCGG 
TATGCTCGGC 
TCTGGGGGCA 
CCGAAACGTT 
GGTGTTATTT 
TTACCGCCGG 
GACGGGCTTG 
GTACGGCGCG 
AATCCGGCAT 
TTCTGGTGGC 
CGAAAAACGC 
AGAAGCAGCA 



GAATACGGCT 
CGTGCCGATT 
GTATGGGTTA 
GTATTGGTCG 
GAAAATCCTC 
ACGCACAGGT 
GTCGCTCGTT 
CATCAGCCGC 
CCGCGCTGAT 
CACAACATCG 
CTTCATCGCA 
GCAAACGCCG 
GCCAAACGCA 
GTAA 



ATGCGGCCGT 
CCCGAGGATT 
TACCAATCCG 
GGGACGGCAT 
AAGTTCAAAC 
TCAGGAAAAA 
TCCTGCCCGG 
AAAGT AT CGT 
TTCCGTGCCC 
ATTGGCTGAT 
TTGGGCGTGC 
ACATT AT CAG 
AGGCGGAAAA 



This encodes a protein having amino acid sequence <SEQ ID 724>: 

1 MFALLEAFFV EYG YAAVFFV LVICGFGVPI PEDLTLVTGG VI SGMGYTNP 
51 H IMFAVGMLG VLVGDGIM FA AGRIWGQKIL KFKPIARIMT PKRYAQVQEK 
101 FDKYGNW VLF VARFLPGLRT AVFV TAGI SR KVSYLR FLIM DGLAALISVP 
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151 VWIYLGEYGA HNIDWLMAKM HSL OSGIFIA LGVLAAALAW F WWRKRRHYQ 
2 01 LYRAQLSEKR AKRKAEKAAK KAAQKQQ* 

ORF78a and ORF78-1 show 89.0% identity in 227 aa overlap: 

10 20 30 40 50 60 

orf7 8a pep MFALLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGMGYTNPHIMFAVGMLG 
| | | : | | | | | | M I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I II I I I I I I I I I I I I I I 
orf7 8-l MFAFLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGMGYTNPHIMFAVGMLG 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf 7 8a pep VLVGDGIMFAAGRIWGQKILKFKPIARIMT PKRYAQVQEKFDKYGNWVLFVARFLPGLRT 
| | | | | | | M I I I I I I II II I : I I I I I I I II II I I I I II I II I M I I I M I I I I I I I I I I 
orf 7 8-1 VLVGDGIMFAAGRIWGQKILRFKPIARIMTPKRYEQVQEKFDKYGNWVLFVARFLPGLRT 

70 80 90 100 110 120 

130 140 150 160 170 180 
orf 7 8a pep AVFVTAGISRKVSYLRFLIMDGLAALISVPVWIYLGEYGAHNIDWLMAKMHSLQSGIFIA 
I I I I I I I II I M I I I I h M I I I M I I I I I : I I I I M I I I I Ml: 

orf 78-1 AVFVTAGISRKVSYLRFIIMDGLAALISVPIWIYLGEYGAHNIDWLMAKMHSLQSGIFVI 

130 140 150 160 170 180 



190 200 210 220 

orf 7 8 a . pep LGVLAAALAWFWWRKRRHYQLYRAQLSEKRAKRKAEKAAKKAAQKQQX 
M: | : : : | | : | | : I I : : I : I I : : I : I I I I : I I I I I I I I I I I :: I I 
orf 78-1 LG I G AT WAW I W WKKRQR I Q F YR S KLKEKRAQRKAAKAAKKAAQ S KQX 

190 200 210 220 



Homology with a predicted ORF from N. gonorrhoeae 

ORF78 shows 97.4% identity over 38 aa overlap with a predicted ORF (ORF78ng) from N. 
gonorrhoeae: 

orf 78 .pep XXLXFXPIAXIMTPXRYEQVQEKFDKYGNWVLFVARFLPGLRTAVFVTAGISRKVSYLRF 137 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf78ng YPVLFVARFLPGLRTAVFVTAGISRKVSYLRF 32 

orf78.pep IIMDGLAA 145 
: I I I I I II 

orf78ng LIMDGLAALISVPWIYLGEYGAHNIDWLMAKMHSLQSGIFIALGVLAAALAWFWWRKRR 92 

The ORF78ng nucleotide sequence <SEQ ID 725> is predicted to encode a protein comprising 
amino acid sequence <SEQ ED 726>: 



1 . . YP VLFVARFL PGLRTAVFV T AGISRKVSYL R FLIMDGLAA LISVPWI YL 
51 GEYGAHNIDW LMAKMHSLQ S GIFIALGVLA AALAWF WWRK RRHYQLYRAQ 

101 LSEKRAKRKA EKAAKKAAQK QQ* 

Further work revealed the complete gonococcal nucleotide sequence <SEQ ID 727>: 



1 atgtttgccc tttTggaagc CTTTTTTGTC GAAtacggCt atgcGGCCGT 

51 GTTTTTCGTT TTGGTCATCT GCGGTTTCGG CGTGCCGATT CCCGAAGATT 

101 TGACCTTGGT AACGGGCGGC GTGATTTCGG GTATGGGTTA TACCAATCCG 

151 CATATTATGT TTGCGGTCGG TATGCTCGGC GTGTTGGCGG GCGACGGCGT 

201 GATGTTTGCC GCCGGACGCA TCTGGGGGCA GAAAATCCTC AAGTTCAAAC 

251 CGATTGCGCG CATCATGACG CCGAAACGTT ACGCGCAGGT TCAGGAAAAA 

301 TTCGACAAAT ACGGCAACTG GGTTCTGTTT GTCGCCCGTT TCCTGCCGGG 

351 TTTGCGGACT GCCGTTTTCG TTACCGCCGG CATCAGCCGC AAAGTATCGT 

4 01 ATCTGCGCTT TCTGATTATG GACGGGCTGG CCGCGCTGAT TTCCGTGCCC 

451 GTTTGGATTT ACTTGGGCGA GTACGGCGCG CACAACATCG ATTGGCTGAT 

501 GGCGAAAATG CACAGCCTGC AATCGGGCAT CTTCATCGCA TTGGGCGTGC 

551 TGGCGGCGGC GCTGGCGTGG TTCTGGTGGC GCAAACGCCG ACATT AT CAG 

601 CTTTACCGCG CACAATTGAG CGAAAAACGC GCCAAACGCA AGGCGGAAAA 

651 GGCAGCGAAA AAAGCGGCAC AGAAGCAGCA GTAa 
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This corresponds to the amino acid sequence <SEQ ID 728; ORF78ng-l>: 

1 MFALLEAFFV EYG YAAVFFV LVICGFGVPI PEDLTLVTGG VISGMGYTNP 

51 H IMFAVGMLG VLAGDGVM FA AGRIWGQKIL KFKPIARIMT PKRYAQVQEK 

101 FDKYGNW VLF VARFLPGLRT AVFV TAGISR KVSYLR FLIM DGLAALISVP 

151 VWIYLGEYGA HNIDWLMAKM HSLQ SGIFIA LGVLAAALAW F WWRKRRHYQ 

2 01 LYRAQLSEKR AKRKAEKAAK KAAQKQQ* 

ORF78ng-l and ORF78-1 show 88.1% identity in 227 aa overlap: 

10 20 30 40 50 60 

orf7 8-l pep MFAFLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGMGYTNPHIMFAVGMLG 
I [ | : M M I I I I I I I I I I I I I I I I i I I I I I I I I I M II I I I I I I I I I I I I I I I I I I 1 I I I 

orf78ng-l MFALLEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGMGYTNPHIMFAVGMLG 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 78-1 . pep VLVGDGIMFAAGRIWGQKILRFKPIARIMTPKRYEQVQEKFDKYGNWVLFVARFLPGLRT 
| | : | | | : | | | | I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I 
orf 78ng-l VLAGDGVMFAAGRIWGQKILKFKPIARIMTPKRYAQVQEKFDKYGNWVLFVARFLPGLRT 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 78-1. pep AVFVTAGISRKVSYLRFIIMDGLAALISVPIWIYLGEYGAHNIDWLMAKMHSLQSGIFVI 

I ] I j I j I I I I I I I I I I I : I I I I I I I I 1 I i I : I I I I I I I I I I I I I I I I I I I I 1 I I I I II : 
orf 78ng-l AVFVTAGISRKVSYLRFLIMDGLAALISVPVWIYLGEYGAHNIDWLMAKMHSLQSGIFIA 

130 140 150 160 170 180 

190 200 210 220 

orf 78-1 . pep LGIGATVVAWIWWKKRQRIQFYRSKLKEKRAQRKAAKAAKKAAQSKQX 
II: I : : : I I : I I : I i : : I : I I : : I : I I I I : I I I I I I I I I I I : : I I 

orf78ng-l LGVLAAALAWFWWRKRRHYQLYRAQLSEKRAKRKAEKAAKKAAQKQQX 

190 200 210 220 

Furthermore, orf78ng-l shows homology to the dedA protein from H. influenzae: 

sp I P45280 I YG2 9_HAEIN HYPOTHETICAL PROTEIN HI1629 >gi 1 1073983 | pir | | D64133 dedA 
protein (dedA) homolog - Haemophilus influenzae (strain Rd KW2 0) 
>gi 1 1574476 (U32836) dedA protein (dedA) [Haemophilus influenzae] Length = 212 
Score = 223 bits (563), Expect = 7e-58 

Identities = 108/182 (59%), Positives = 140/182 (76%), Gaps = 2/182 (1%) 

Query: 5 LEAFFVEYGYAAVFFVLVICGFGVPIPEDLTLVTGGVISGM — GYTNPHIMFAVGMLGVL 62 

L FF EYGY AV FVL+ICGFGVPIPED+TLV+GGVI+G+ N H+M V M+GVL 

Sbjct: 21 LIGFFTEYGYWAVLFVLIICGFGVPIPEDITLVSGGVIAGLYPENVNSHLMLLVSMIGVL 80 

Query: 63 AGDGVMFAAGRIWGQKILKFKPIARIMTPKRYAQVQEKFDKYGNWVLFVARFLPGLRTAV 122 

AGD M+ GRI+G KIL+F+PI RI+T +R V+EKF +YGN VLFVARFLPGLR + 
Sbjct: 81 AGDSCMYWLGRIYGTKILRFRPIRRIVTLQRLRMVREKFSQYGNRVLFVARFLPGLRAPI 140 

Query: 123 FVTAGISRKVSYLRFLIMDGLAALISVPVWIYLGEYGAHNIDWLMAKMHSLQSGIFIALG 182 

++ +GI+R+VSY+RF+++D AA+ISVP+WIYLGE GA N+DWL ++ Q I+I +G 
Sbjct: 141 YMVSGITRRVSYVRFVLIDFCAAIISVPIWIYLGELGAKNLDWLHTQIQKGQIVIYIFIG 2 00 

Query: 183 VL 184 
L 

Sbjct: 201 YL 202 



Based on this analysis, including the presence of putative transmembrane domains, it is predicted 
that these proteins from N. meningitidis and 7Y. gonorrhoeae, and their epitopes, could be useful 
antigens for vaccines or diagnostics, or for raising antibodies. 
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Example 87 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 729>: 

1 ATGAAAAAAT TATTGGCGGC CGTGATGATG GCAGGTTTGG CAGGCGCGGT 

51 TTCCGCCGCC GGAGT CCACG TTGAGGACGG CTGGGCGCGC ACCACCGTCG 

101 AAGGTATGAA AATAGGCGGC GCGTTCATGA AAAT CCACAA CGACGAAGCC 

151 AAACAAGACT TTTTGCTCGG CGGAAGCAGC CCCGTTGCCG ACCGCGTCGA 

201 AGTGCATACC CACATCAACG ACAACGGCGT GATGCGGATG CGCGAAGTCG 

251 AAGGCGGCGT GCCTTTGGAA GCGAAATCCG TTACCGAACT CAAACCCGGC 

301 AGCTATCATG TGATGTTTAT GGGTTTGAAA AAACAATTAA AAGAGGGCGA 

351 TAAAATTCCC GTTACCCTGA AATTTAAAAA CGCCAAAGCG CAAACCGTCC 

401 AACTGGAAGT CAAAATCGCG CCGATGCCGG CAATGAACCA C. . . 

This corresponds to the amino acid sequence <SEQ ID 730; ORF79>: 



1 MKKLLAAVMM AGLAGA VSAA GVHVE DGWAR TTVEGMKIGG AFMKIHNDEA 
51 KQDFLLGGSS PVADRVEVHT HINDNGVMRM REVEGGVPLE AKSVTELKPG 
101 SYHVMFMGLK KQLKEGDKIP VTLKFKNAKA QTVQLEVKIA PMPAMNH. . 

Further work revealed the complete nucleotide sequence <SEQ ID 73 1>: 



1 ATGAAAAAAT TATTGGCGGC CGTGATGATG GCAGGTTTGG CAGGCGCGGT 

51 TTCCGCCGCC GGAGTCCACG TTGAGGACGG CTGGGCGCGC ACCACCGTCG 

101 AAGGTATGAA AATAGGCGGC GCGTTCATGA AAAT CCACAA CGACGAAGCC 

151 AAACAAGACT TTTTGCTCGG CGGAAGCAGC CCCGTTGCCG ACCGCGTCGA 

2 01 AGTGCATACC CACATCAACG ACAACGGCGT GATGCGGATG CGCGAAGTCG 
251 AAGGCGGCGT GCCTTTGGAA GCGAAATCCG TTACCGAACT CAAACCCGGC 

3 01 AGCTATCATG TGATGTTTAT GGGTTTGAAA AAACAATTAA AAGAGGGCGA 
351 TAAAATTCCC GTTACCCTGA AATTTAAAAA CGCCAAAGCG CAAACCGTCC 

4 01 AACTGGAAGT CAAAATCGCG CCGATGCCGG CAATGAACCA CGGT CAT C AC 
4 51 CACGGCGAAG CGCATCAGCA CTAA 

This corresponds to the amino acid sequence <SEQ ID 732; ORF79-l>: 



1 MKKLLAAVMM AGLAGA VSAA GVHVE DGWAR TTVEGMKIGG AFMKIHNDEA 
51 KQDFLLGGSS PVADRVEVHT HINDNGVMRM REVEGGVPLE AKSVTELKPG 
101 SYHVMFMGLK KQLKEGDKIP VTLKFKNAKA QTVQLEVKIA PMPAMNHGHH 
151 HGEAHQH* 

Computer analysis of this amino acid sequence revealed a putative leader peptide and also gave the 
following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF79 shows 94.6% identity over a 147aa overlap with an ORF (ORF79a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

orf7 9.pep MKKLLAAVMMAGLAGA VSAAGVHVEDGWARTTVEGMKIGGAFMKIHNDEAKQDFLLGGSS 
II I I I I I i I I I I I I I I I I I I : I ] I I I I I I I I I I I I I : I I I I I I I I I I I I I M I M I I I I 
orf 7 9a MKXLLAAVMMAGLAGA VSAAGIHVEDGWARTTVEGMKMGGAFMKIHNDEAKQDFLLGGSS 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 7 9 . pep PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIP 
I I I I I I f I! I I I I I I I I I I I I I I! I I I I I I I I I I I I I I I I I I I I I I I I INI! Mill 
orf 7 9a PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGXKKQLKXGDKIP 

70 80 90 100 110 120 



130 140 
orf 7 9. pep VTLKFKNAKAQTVQLEVKIAPMPAMNH 
I I I I I II I I I I II I I I I I II I I I : I 
orf 7 9a VTLKFKNAKAQTVQLEVKTAPMSAMDHGHHHGEAHQHX 

130 140 150 
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The complete length ORF79a nucleotide sequence <SEQ ID 73 3> is: 

1 ATGAAANAAC TATTGGCAGC CGTGATGATG GCAGGTTTGG CAGGCGCGGT 

51 TTCCGCCGCC GGAAT CCACG TTGAGGACGG CTGGGCGCGC ACCACCGTCG 

101 AAGGTATGAA AATGGGCGGC GCGTTCATGA AAATCCACAA CGACGAAGCC 

151 AAACAAGACT TTTTGCTCGG CGGAAGCAGC CCTGTTGCCG ACCGCGTCGA 

201 AGTGCATACC CATATCAATG ATAACGGTGT GATGCGGATG CGCGAAGTCG 

251 AAGGCGGCGT GCCTTTGGAG GCGAAATCCG TTACCGAACT CAAACCCGGC 

301 AGCTATCATG TCATGTTTAT GGGTNTGAAA AAACAATTAA AAGANGGCGA 

351 CAAGATTCCC GTTACCCTGA AATTTAAAAA CGCCAAAGCA CAAACCGTCC 

401 AACTGGAAGT CAAAACCGCG CCGATGTCGG CAATGGACCA CGGTCATCAC 

451 CACGGCGAAG CGCATCAGCA CTAA 

This encodes a protein having amino acid sequence <SEQ ID 734>: 

1 MKXLLAAVMM AGLAGA VSAA GIHVEDGWAR TTVEGMKMGG AFMKIHNDEA 
51 KQDFLLGGSS PVADRVEVHT HINDNGVMRM REVEGGVPLE AKSVTELKPG 
101 SYHVMFMGXK KQLKXGDKIP VTLKFKNAKA QTVQLEVKTA PMSAMDHGHH 
151 HGEAHQH* 

ORF79a and ORF79-1 show 94.9% identity in 157 aa overlap: 

10 20 30 40 50 60 

orf 7 9a . pep MKXLLAAVMMAGLAGAVSAAGIHVEDGWARTTVEGMKMGGAFMKIHNDEAKQDFLLGGSS 
N II I I I I I M I I I! II M I : N I I I I I I I I I I I I I :! I I !! I 1 ! I I I | | I | | | | ] | | | 

orf 7 9-1 MKKLLAAViynyiAGLAGAVSAAGVHVEDGWARTTVEGMKIGGAFMKIHNDEAKQDFLLGGSS 
10 20 30 40 50 60 

70 80 90 100 110 120 

orf 7 9a . pep PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGXKKQLKXGDKIP 
I I I 1 I I I I M I I I I I I I I I I I I I I I I I I I I I I 1 i I I | I I I | | I | | | | | | | | | | 1 | | 1 | 
orf 7 9-1 PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIP 

70 80 90 100 110 120 

130 140 150 

or f 7 9a . pep VTLKFKNAKAQTVQLEVKTAPMSAMDHGHHHGEAHQHX 
I I I I I I I I I I I I I I I I I I III I I : I I I I ! II I I I I I 

orf79-l 

Homology with a predicted ORF from N.gonorrhoeae 

ORF79 shows 96.1% identity over 76 aa overlap with a predicted ORF (ORF79ng) from 
N.gonorrhoeae: 

orf 7 9 .pep FMKIHNDEAKQDFLLGGSSPVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGS 101 

oxf7 9ng INDNGVMRMREVKGGVPLEAKSVTELKPGS 30 

orf 7 9. pep YHVMFMGLKKQLKEGDKIPVTLKFKNAKAQTVQLEVKIAPMPAMNH 147 

I I I I I II I I M I I I I I I I I I I I I I I I I I I I I I I | I II Ml M I I 
orf79ng YHVMFMGLKKQLKEGDKIPVTLKFKNAKAQTVQLEVKTAPMSAMHHGHHHGEAHQH 86 

An ORF79ng nucleotide sequence <SEQ ID 735> was predicted to encode a protein comprising 
amino acid sequence <SEQ ID 736>: 

1 . . INDNGVMRMR EVKGGVPLEA KSVTELKPGS YHVMFMGLKK QLKEGDKIPV 
51 TLKFKNAKAQ TVQLEVKTAP MSAMNHGHHH GEAHQH* 

Further work revealed the complete gonococcal DNA sequence <SEQ ID 73 7>: 

1 ATGAAAAAAT TATTGGCAGC CGTGATGATG GCAGGTTTGG CAGGCGCGGT 

51 TTccgccgCc GGagTccAtG TCGAggACGG CTGGGCGCGc accaCTGtcg 

101 aaggtATgaa aatggGCGGC GCgttCATga aaATCCACAA CGACGaaGcc 

151 atacaaGACt ttgtgcTCgg CGGaagcatg cccgttgccg accgcGTCGA 

2 01 AGTGCAtaca cacATCAACG ACAACGGCGT GATGCGTATG CGCGAAGTCA 
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251 AAGGCGGCGT GCCTTTGGAG GCGAAATCCG TTACCGAACT CAAACCCGGC 

301 AGCTAT CACG T GAT GT T TAT GGGTTTGAAA AAACAACTGA AAGAGGGCGA 

351 CAAGATTCCC GTTACCCTGA AATTTAAAAA CGCCAAAGCG CAAACCGTCC 

401 AACTGGAAGT CAAAACCGCG CCGATGTCGG CAATGAACCA CGGTCATCAC 

451 CACGGCGAAG CGCATCAGCA CTAA 

This corresponds to the amino acid sequence <SEQ ID 738; ORF79ng-l>: 

1 MKKLLAAVMM AGLAGAV SAA GVHVE DGWAR TTVEGMKMGG AFMKIHNDEA 
51 IQDFVLGGSM PVADRVEVHT HINDNGVMRM REVKGGVPLE AKSVTELKPG 
101 SYHVMFMGLK KQLKEGDKIP VTLKFKNAKA QTVQLEVKTA PMSAMNHGHH 
151 HGEAHQH* 

ORF79ng-l and ORF79-1 show 95.5% identity in 157 aa overlap: 

10 20 30 40 50 60 

orf 7 9-1 pep MKKLLAAVMMAGLAGAVSAAGVHVEDGWARTTVEGMKIGGAFMKIHNDEAKQDFLLGGSS 

| ] | I | | | I I I I 1 M I I I I I I II I I I I I I I I I I 1 I I I I : I I I I I I I I I I I I 111:1111 
orf 7 9ng-l MKKLLAAVMMAGLAGAVSAAGVHVEDGWARTTVEGMKMGGAFMKIHNDEAIQDFVLGGSM 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f 7 9-1 pep PVADRVEVHTHINDNGVMRMREVEGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIP 

I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I M I 1 I I I II 1 I 1 I M M I I I I 

orf7 9ng-l PVADRVEVHTHINDNGVMRMREVKGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIP 

70 80 90 100 110 120 

130 140 150 

or f 7 9- 1 . pep VTLKFKNAKAQTVQLEVKIAPMPAMNHGHHHGEAHQHX 
I II II I I I I I I II I I I II III I I I I I I i II I I 1 I I I 
orf 7 9ng-l VTLKFKNAKAQTVQLEVKTAPMSAMNHGHHHGEAHQHX 

130 140 150 

Furthermore, ORF79ng-l shows significant homology to a protein from Aquifex aeolicus: 

gi 12983695 (AE000731) putative protein [Aquifex aeolicus] Length = 151 
Score = 63.6 bits (152), Expect = 6e-10 

Identities = 38/114 (33%), Positives = 58/114 (50%), Gaps = 1/114 (0%) 

Query: 24 VEDGWARTTVEGMKMGGAFMKIHNDEAIQDFVLGGSMPVADRVEVHTHINDNGVMRMREV 83 

V+ W G M I N+ D+++G +A RVE+H + +N V +M 

Sbjct: 27 VKHPWVMEPPPGPNTTMMGMIIVNEGDEPDYLIGAKTDIAQRVELHKTVIENDVAKMVPQ 86 

Query: 84 KGGVPLEAKSVTELKPGSYHVMFMGLKKQLKEGDKIPVTLKFKNAKAQTVQLEV 137 

+ + + K E K YHVM +GLKK++KEGDK+ V L F+ + TV+ V 
Sbjct: 87 E R- 1 E I P PKGKVE FKHHG YHVM 1 1 GLKKR I KE GDKVKVE LIFEKSGKI T VE APV 13 9 

Based on this analysis, it is predicted that the proteins from N meningitidis and N. gonorrhoeae, and 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF79-1 (15.6kDa) was cloned in the pET vector and expressed in E.coli, as described above. The 
products of protein expression and purification were analyzed by SDS-PAGE. Figure 18A shows 
the results of affinity purification of the His-fusion protein. Purified His-fusion protein was used 
to immunise mice, whose sera were used for ELISA (positive result) and FACS analysis (Figure 
18B) These experiments confirm that ORF79-1 is a surface-exposed protein, and that it is a useful 
immunogen. 
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Example 88 

The following DNA sequence, believed to be complete, was identified in N. meningitidis <SEQ ID 
739>: 

1 ATGACGGTAA CTGCGGCCGA AGGCGGCAAA GCTGCCAAGG CGTTAAAAAA 

5 51 ATATCTGATT ACGGGCATTT TGGTCTGGCT GCCGATTGCG GTAACGGTTT 

101 GGGTGGTTTC CTATATCGTT TCCGCGTCCG AT CAGCTCGT CAACCTGCTG 

151 CCGAAGCAAT GGCGGCCGCA ATATGTTTTG GGGTTTAATA TCCCGGGGCT 

201 GGGCGTTATC GTTGCCATTG CCGTATTGTT TGTAACCGGA TTGTTTGCCG 

2 51 CCAACGTATT GGGTCGGCAG ATCCTCGCCG CGTGGGACAG CCTGTTGGGG 
JO 301 CGGATTCCGG TTGTGAAAtC CATCTATTCG AGTGTGAAAA AAGTATCCGA 

351 ATacgTGCTG TCCGACAGCA GCCGTTCGTT TAAAACGCCG GTACTCGTGC 

401 CGTTTCCCCA GCCCGGTATT TGGACGATyG CTTTCGTGTC AGGGCAGGTG 

451 TCGAATGCGG TTAAGGCCGC ATTGCCGAAs GACGGCGATT ATCTTTCCGT 

501 GTATGTT CCG ACCACGCCGA ATCCGACCGG CGGTTACTAT ATT AT GGTAA 

15 551 AGAAAAGCGA TGTGCGCGAA CTCGATATGA GCGTGGACGA AsCATTGAAA 

601 TATGTGATTT CGCTGGGTAT GGTCATCCCT GACGACCTGC CCGTCAAAAC 

651 ATTGGCAsGA CCTATGCCGT CTGAAAAGGC GGATTTGCCC GAACAACAAT 

701 AA 

This corresponds to the amino acid sequence <SEQ ID 740; ORF98>: 

20 1 MTVTAAEGGK AAKALKKYLI TGILVWLPIA VTVWVVSYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIPGLGVI VAIAVLFVTG LFAANVLGRQ ILAAWDSLLG 

101 RIPVVKSIYS SVKKVSEYVL SDSSRSFKTP VLVPFPQPGI WTIAFVSGQV 

151 SNAVKAALPX DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEXLK 

201 YVISLGMVIP DDL PVKT LAX PMPSEKADLP EQQ* 

25 Further work revealed the complete nucleotide sequence <SEQ ID 74 1>: 

1 ATGACGGAAC nTGCGGCCGA AGGCGGCAAA GCTGCCAArG CGTTAAAAAA 

51 ATATCTGATT ACGGGCATTT TGGTCTGGCT GCCGATTGCG GTAACGGTTT 

101 GGGTGGTTTC CTATATCGTT TCCGCGTCCG ATCAGCTCGT CAACCTGCTG 

151 CCGAAGCAAT GGCGGCCGCA ATATGTTTTG GGGTTTAATA TCCCGGGGCT 

30 201 GGGCGTTATC GTTGCCATTG CCGTATTGTT TGTAACCGGA TTGTTTGCCG 

251 CCAACGTATT GGGTCGGCAG ATCCTCGCCG CGTGGGACAG CCTGTTGGGG 

3 01 CGGATTCCGG TTGTGAAATC CATCTATTCG AGTGTGAAAA AAGTATCCGA 
351 ATCGCTGCTG TCCGACAGCA GCCGTTCGTT TAAAACGCCG GTACTCGTGC 

4 01 CGTTTCCCCA GCCCGGTATT TGGACGATTG CTTTCGTGTC AGGGCAGGTG 
35 4 51 TCGAATGCGG TTAAGGCCGC ATTGCCGAAG GACGGCGATT ATCTTTCCGT 

501 GTATGTTCCG ACCACGCCGA ATCCGACCGG CGGTTACTAT ATTATGGTAA 

551 AGAAAAGCGA TGTGCGCGAA CTCGATATGA GCGTGGACGA AG CAT T GAAA 

601 TATGTGATTT CGCTGGGTAT GGTCATCCCT GACGACCTGC CCGTCAAAAC 

651 AT TGGCAGGA CCTATGCCGT CTGAAAAGGC GGATTTGCCC GAACAACAAT 

40 701 AA 

This corresponds to the amino acid sequence <SEQ ID 742; ORF98-l>: 

1 MTEXAAEGGK AAKALKKYL I TGILVWLPIA VTVWVV SYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIPG LGVI VAIAVLFVTG LFA ANVLGRQ ILAAWDSLLG 

101 RIPVVKSIYS SVKKVSESLL SDSSRSFKTP VLVPFPQPGI WTIAFVSGQV 

45 151 SNAVKAALPK DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEALK 

2 01 YVISLGMVIP DDLPVKTLAG PMPSEKADLP EQQ* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted QRF from N. meningitidis (strain A) 

ORF98 shows 96.1% identity over a 233aa overlap with an ORF (ORF98a) from strain A of N. 
50 meningitidis: 

10 20 30 40 50 60 

orf 98 . pep MTVTAAE GGKAAKALKKYLI TGILVWLP I AVTVWWS Y IVSAS DQLVNLLPKQWRPQYVL 
II I I I I I I I I I I 1 I I I I I i I I I I I I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I i 
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MTEPAAEGGKAAKALKKYLITGILWLPIAVTWWSYIVSASDQLVNLLPKQWRPQYVL 
10 20 30 40 50 60 

70 80 90 100 110 120 

GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSEYVL 

I I I I I I I I I I I I I I I II [ 1 1 I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I 
GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSXSLL 

70 80 90 100 110 120 

130 140 150 160 170 180 

SDSSRSFKTPVLVPFPQPGIWTIAFVSGQVSNAVKAALPXDGDYLSVYVPTTPNPTGGYY 

II I I I I I I I I I II I I I I I i M I I M I I I I N I II M I I II M II f M I I I M I II M I 

SDSSRSFKT PVLVPFPQSGIWTIAFVSGQVSNAVKAALPKDGDYLSVYVPTTPNPTGGYY 
130 140 150 160 170 180 

190 200 210 220 230 

IMVKKSDVRELDMSVDEXLKYVISLGMVIPDDLPVKTLAXPMPSEKADLPEQQX 
I 1 II I I I II I I I I I I II I 1 I I I II I II I I I I I I I I I II I I I I II I I I I I II I 
IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPSEKADLPEQQX 

190 200 210 220 230 

The complete length ORF98a nucleotide sequence <SEQ ID 743> is: 

1 ATGACGGAAC CTGCGGCCGA AGGCGGCAAA GCTGCCAAGG CGTTAAAAAA 

51 ATATCTGATT ACGGGCATTT TGGTCTGGCT GCCGATTGCG GTAACGGTTT 

101 GGGTGGTTTC CTATATCGTT TCCGCGTCCG ATCAGCTCGT CAACCTGCTG 

151 CCGAAGCAAT GGCGGCCGCA ATATGTTTTG GGGTTTAATA TCCCGGGGCT 

201 GGGCGTTATC GTTGCCATTG CCGTATTGTT TGTAACCGGA TTATTTGCCG 

251 CAAACGTATT GGGCCGGCAG ATTCTTGCCG CGTGGGACAG CTTGTTGGGG 

301 CGGATTCCGG TTGTGAAGTC CATCTATTCG AGTGTGAAAA AAGTATCCGA 

351 NTCGTTGCTG TCCGACAGCA GCCGTTCGTT T AAAAC AC C A GTACTCGTGC 

401 CGTTTCCCCA ATCGGGTATT TGGACAATCG CATTCGTGTC CGGTCAGGTG 

451 TCGAATGCGG TTAAGGCCGC ATTGCCGAAG GACGGCGATT ATCTTTCCGT 

501 GTATGTTCCG ACCACGCCGA ATCCGACCGG CGGTTACTAT ATTATGGTAA 

551 AGAAAAGCGA TGTGCGCGAA CTCGATATGA GCGTGGACGA AGCGTTGAAA 

601 TATGTGATTT CGCTGGGTAT GGTCATCCCT GACGACCTGC CCGTCAAAAC 

651 ATTGGCAGGA CCTATGCCGT CTGAAAAGGC GGATTTGCCC GAACAACAAT 

701 AA 

This encodes a protein having amino acid sequence <SEQ ID 744>: 

1 MTEPAAEGGK AAKALKKYL I TGILVWLPIA VTVWVV SYIV SAS DQLVNLL 

51 PKQWRPQYVL GFNIPG LGVI VAIAVLFVTG LFAA NVLGRQ ILAAWDSLLG 

101 RIPVVKSIYS SVKKVSXSLL SDSSRSFKTP VLVPFPQSGI WTIAFVSGQV 

151 SNAVKAALPK DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEALK 

201 YVISLGMVIP DDL PVKT LAG PMPSEKADLP EQQ* 

ORF98a and ORF98-1 show 98.7% identity in 233 aa overlap: 

10 20 30 40 50 60 

orf 98a . pep MTEPAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 

or f 9 8 - 1 MTEXAAEGGKAAKALKKYL I TGI LVWLPI AVTVWWS YIVS AS DQLVNLL PKQWRPQYVL 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 98a. pep GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKS I YS SVKKVSXSLL 

M M I I N I I I I I I M N I I I I I M M I I I I N I I I M I I I I I I II M M II I I I I III 
orf 98-1 GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSESLL 
70 80 90 100 110 120 

130 140 150 160 170 180 

orf 98a. pep SDSSRSFKTPVLVPFPQSGIWTIAFVSGQVSNAVKAALPKDGDYLSVYVPTTPNPTGGYY 

orf 98-1 SDSSRSFKTPVLVPFPQPGIWTIAFVSGQVSNAVKAALPKDGDY^ 

130 140 150 160 170 180 



orf 98 .pep 
orf98a 

orf 98. pep 

orf98a 

orf 98. pep 
orf 98a 



190 200 210 220 230 

IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPSEKADLPEQQX 
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| 1 | 1 | | I I I I I I I I I I I I I I I I I I I I I ! I I M I I I 1 I I II M i I I MM 

orf 98-1 IMVKKSDVRELDMSVDEALKYVI S LGMVI PDDLPVKT LAGPMP SEKADL PEQQX 

190 200 210 220 230 

5 Homology with a predicted ORF from N. gonorrhoeae 

ORF98 shows 95.3% identity over a 233 aa overlap with a predicted ORF (ORF98ng) from 
N. gonorrhoeae: 

10 20 30 40 50 60 

orf98 pep MTVTAAEGGKAAKALKKYLITGILVWLPIAVTWVVSYIVSASDQLVNLLPKQWRPQYVL 60 

10 ' II I I M I I I I I I I I I I I I I I I I I II I I I I I I I I M I I II I I I II I II ! I M I I I I I I I 

orf98ng MTEPAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 60 

orf98 pep GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSEYVL 120 
| | | | I I I I 1 I I i I [ I I I I II I I I I II I I I I I II II I I I I I I I I I I I I I I I I I I I I I : I 
15 orf98ng GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLXRIPVVKSIYSSVKKVSESLL 120 

orf98 pep SDSSRSFKTPVLVPFPQPGIWTIAFVSGQVSNAVKAALPXDGDYLSVYVPTTPNPTGGYY 180 

| | | | | I M I I I I I I I II I I I I I I I I I I I II I I I I I I I I I II II II I I I I I II I I I I I I 
orf98ng SDSSRSFKTPVLVPFPQSGIWTIAFVSGQVSNAVKAALPQDGDYLSVYVPTTPNPTGGYY 180 

20 

orf98.pep I MVKKSDVRELDMSVDEXLKYVISLGMVI PDDLPVKT LAXPMPSEKADLPEQQ 233 

I | I I I I I I I I 11 I ! I I I I I I I I I I I I I I I I I I I I I I I I III I I I : I I I I I 
orf98ng IMVKKSDVRELDMSVDEALKYVI SLGMVIPDDLPVKTLAGPMPPEKAELPEQQ 233 

The complete length ORF98ng nucleotide sequence <SEQ ID 745> is predicted to encode a protein 
25 having amino acid sequence <SEQ ID 746>: 

1 MTEPAAEGGK AAKALKKYL I TGILVWLPIA VTVWW SYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIPG LGVI VAIAVLFVTG LFA ANVLGRQ ILAAWDSLLX 

101 RIPVVKSIYS SVKKVSESLL SDSSRSFKTP VLVPFPQSGI WTIAFVSGQV 

151 SNAVKAALPQ DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEALK 

30 201 YVISLGMVIP DDLPVKTLAG PMPPEKAELP EQQ* 

Further work revealed the complete nucleotide sequence <SEQ ID 747>: 

1 ATGACGGAAC CTGCGGCCGA AGGCGGCAAA GCTGCCAAGG CGTTAAAAAA 

51 ATATCTGATT ACAGGCATTT TGGTCTGGCT GCCGATTGCG GTAACGGTTT 

101 GGGTGGTTTC CTATATCGTT TCCGCGTCCG ACCAGCTTGT CAACCTGCTG 

35 151 CCGAAGCAAT GGCGGCCGCA ATATGTTTTG GGGTTTAATA TCCCCGGGCT 

201 CGGCGTTATT GTTGCCATTG CCGTATTGTT TGTAACCGGA TTATTTGCCG 

251 CAAACGTGTT GGGCCGGCAG ATTCTTGCCG CGTGGGACAG CCTGTTgggg 

301 cggaTTCCGG TTGTCAAATC CATCTATTCG AGTGTGAAAA AAGTATCCGA 

351 ATCGCTGCTG TCCGACAGCA GCCGTTCGTT TAAAACGCCG GTACTCGTGC 

40 4 01 CGTTTCCCCA ATCGGGTATT TGGACAATCG CATTCGTGTC CGGTCAGGTG 

451 TCGAATGCGG TTAAGGCCGC ATTGCCGCAG GATGGCGATT ATCTTTCCGT 

501 GTATGTCCCG ACCACGCCCA ACCCGACCGG CGGTTACTAT ATTATGGTAA 

551 AGAAAAGCGA TGTGCGCGAA CTCGATATGA GCGTGGACGA AGCGTTGAAA 

601 TATGTGATTT CGCTGGGTAT GGTCATCCCT GACGACCTGC CCGTCAAAAC 

45 651 ATTGGCAGGA CCTATGCCGC CTGAAAAGGC GGAGTTGCCC GAACAACAAT 

701 AA 

This corresponds to the amino acid sequence <SEQ ID 748; ORF98ng-l>: 

1 MTEPAAEGGK AAKALKKYL I TGILVWLPIA VTVWW SYIV SASDQLVNLL 

51 PKQWRPQYVL GFNIPG LGVI VAIAVLFVTG LFA ANVLGRQ ILAAWDSLLG 

50 101 RIPVVKSIYS SVKKVSESLL SDSSRSFKTP VLVPFPQSGI WTIAFVSGQV 

151 SNAVKAALPQ DGDYLSVYVP TTPNPTGGYY IMVKKSDVRE LDMSVDEALK 

201 YVISLGMVIP DDLPVKTLAG PMPPEKAELP EQQ* 



ORF98ng-l and ORF98-1 show 97.9% identity in 233 aa overlap: 
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MTEPAAEGGKAAKALKKYLITGILVWLPIAVTVWWSYIVSASDQLVNLLPKQWRPQYVL 



orf98~l.p 
orf98ng-l 



orf 98-1 .pe; 
orf 98ng-l 



rf 98-1 .pep 
■ rf 98ng-l 



70 80 90 100 110 120 

GFNIPGLGVIVAIAVLFVTGLFAANVLGRQILAAWDSLLGRIPWKSIYSSVKKVSESLL 



130 140 150 160 170 180 

SDSSRSFKTPVLVPFPQPGIWTIAFVSGQVSNAVKAALPKDGDYLSVYVPTTPNPTGGYY 

| | | | | ] | | | | | | | | | I I I I I I I I 1 I I I I I I I I I I I I I I : M M I I I I I I I I I I I I M I I 
SDSSRSFKTPVLVPFPQSGIWTIAFVSGQVSNAVKAALPQDGDYLSVYVPTTPNPTGGYY 
130 140 150 160 170 180 

190 200 210 220 230 

IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPSEKADLPEQQX 

| | | | | | | I I I I I II I II M I I I I I ! I I I I II I I I I I I I I M I I 111 = 111111 
IMVKKSDVRELDMSVDEALKYVISLGMVIPDDLPVKTLAGPMPPEBCAELPEQQX 
190 200 210 220 230 



Based on this analysis, including the fact that the putative transmembrane domains in the 
gonococcal protein are identical to the sequences in the meningococcal protein, it is predicted that 
the proteins from N. meningitidis and /V. gonorrhoeae, and their epitopes, could be useful antigens 
for vaccines or diagnostics, or for raising antibodies. 



Example 89 

The following partial DNA sequence was identified mN. meningitidis <SEQ ED 749>: 

1 ATgAAAACGG TAGTCTGGAT TGTCGTCCTG TTTGCCGCCG CCGT CGGACT 

51 GGCGCTGGCT TCGGGCATTT ACACCGGCGA CGTGTATATC GTACTCGGAC 

101 AGACCATGCT CAGAATCAAC CTGCACGCCT TTGTGTTAGG TTCGCTGATT 

151 GCCGTCGTGG TGTGGTATTT CTTGTTTAAA TTCATTATCG GsGgTACTCA 

201 AT AT CCCCGA AAAGATGCAG CGTTTCGGTT CGGCnCGTAA AGGCCkCAAG 

251 ssCGsGCTTG CCTTGAACAA GGCGGGTTTG GCGTATTTTG AAGGGCGTTT 

3 01 TGAAAAGGCG GAACTAGAAG CCTCACGCGT GTTGGTCAAC AAAGtAGGCC 
351 GaGAGACAAC CGGACTTTGG CATTGATGCT GrGCGCGCAC GCCGCCGGAC 

4 01 AGATGGAAAA CATCGAssTG CGCGACCGTT ATCTTGCGGA AAT CGCCAAA 
4 51 CTGCCGGAAA AACAGCAGCT TTCCCGTTAT CTTTTGTTGG CGGAATCGGC 
501 GTTGAACCGG CGCGATTACG AAGCGGCGGA AGCCAATCTT CATGCGGCGG 
551 CGAAGATGAA TGCCAACCTT ACGCGCCTCG TGCGTCTGCA . ATTCGTTAC 
601 . GCTTTCGACA GGGGCGACGC GTTGCAGGTT CTGGCAAAAA CCGAAAAACT 
651 TTCCAAGGCG GGCGCGTTGG GCAAATCGGA AATGGAACGG TATCAAAATT 

7 01 GGGCATAT£C GTCGCCAGCT GGCGGATGCT GCCGATGCCG CCGCTTTGAA 
751 AACCTGCCTG AAGCGGATTC CCGACAGCCT CAAAAACGGG GAATTGAGCG 
801 TATCGGTTGC GGAAAAGTAC GAACGTTTGG GACTGTATGC CGATGCGGTC 

8 51 AAATGGGTCA AACAGCATTA TCCGCAsAAC CGCCGCCCCG AGCTTTTGGA 
901 AGCCTTTGTC GAAAGCGTGC GCTTTTTGGG CGAGCGCGAA CAGCAGAAAG 
951 CCATCGATTT TGCCGATGCT TGGCTGAAAG AACAGCCCGA TAACGCGCTT 

1001 CTGCTGATGT ATCTCGGTCG GCTCGCCTTC GGCCGCAAAC TTTGGGGCAA 

1051 GGCAAAAGGC TACCTTGAAG CGAGCATTGC ATTAAAGCCG AGTATTTCCG 

1101 CGCGTTTGGT TCTAACAAAG GTTTTCGACG AAATCGGAGA ACCGCAGAAG 

1151 GCGGAGGCGC AC . . . 

This corresponds to the amino acid sequence <SEQ ED 750; ORF100>: 

1 MKTWWIWL FAAAVGLALA SGIYTGDVYI VLGQTMLRIN LHAFVLGSLI 

51 AWVWYFLFK FIIGVLNIPE KMQRFGSARK GXKXXLALNK AGLAYFEGRF 

101 EKAELEASRV LVNKVGRDNR T L ALMLXAH A AGQMENIXXR DRYLAEIAKL 

151 PEKQQLSRYL LLAESALNRR DYEAAEANLH AAAKMNANLT RLVRLXIRYA 

201 FDRGDALQVL AKTEKLSKAG ALGKSEMERY QNWAYRRQLA DAADAAALKT 

251 CLKRIPDSLK NGELSVSVAE KYERLGLYAD AVKWVKQHYP XNRRPE LLEA 

3 01 FVESVRFLGE REQQKAIDFA DAWLKEQPDN ALLLMYLGRL AFGRKLWGKA 
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351 KGYLEASIAL KPSISARLVL TKVFDEIGEP QKAEAH. . . 

Further work revealed the complete nucleotide sequence <SEQ ID 75 1 >: 



1 ATGAAAACGG TAGTCTGGAT TGTCGTCCTG TTTGCCGCCG CCGTCGGACT 

51 GGCGCTGGCT TCGGGCATTT ACACCGGCGA CGTGTATATC GTACTCGGAC 

101 AGACCATGCT CAGAATCAAC CTGCACGCCT TTGTGTTAGG TTCGCTGATT 

151 GCCGTCGTGG TGTGGTATTT CTTGTTTAAA TTCATTATCG GCGTACTCAA 

201 TATCCCCGAA AAGATGCAGC GTTTCGGTTC GGCGCGTAAA GGCCGCAAGG 

251 CCGCGCTTGC CTTGAACAAG GCGGGTTTGG CGTATTTTGA AGGGCGTTTT 

301 GAAAAGGCGG AACTAGAAGC CTCACGCGTG TTGGTCAACA AAGAGGCCGG 

351 AGACAAC CGG ACTTTGGCAT TGATGCTGGG CGCGCACGCC GCCGGACAGA 

401 TGGAAAACAT CGAGCTGCGC GACCGTTATC TTGCGGAAAT CGCCAAACTG 

451 CCGGAAAAAC AGCAGCTTTC CCGTTATCTT TTGTTGGCGG AATCGGCGTT 

501 GAACCGGCGC GATTACGAAG CGGCGGAAGC CAATCTTCAT GCGGCGGCGA 

551 AGATGAATGC CAACCTTACG CGCCTCGTGC GTCTGCAACT TCGTTACGCT 

601 TTCGACAGGG GCGACGCGTT GCAGGTTCTG GCAAAAACCG AAAAACTTTC 

651 CAAGGCGGGC GCGTTGGGCA AATCGGAAAT GGAACGGTAT CAAAATTGGG 

701 CATACCGCCG CCAGCTGGCG GAT.GCTGCCG ATGCCGCCGC TTTGAAAACC 

751 TGCCTGAAGC GGATTCCCGA CAGCCTCAAA AACGGGGAAT TGAGCGTATC 

801 GGTTGCGGAA AAGTACGAAC GTTTGGGACT GTATGCCGAT GCGGTCAAAT 

851 GGGTCAAACA GCATTATCCG CACAACCGCC GCCCCGAGCT TTTGGAAGCC 

901 TTTGTCGAAA GCGTGCGCTT TTTGGGCGAG CGCGAACAGC AGAAAGCCAT 

951 CGATTTTGCC GATGCTTGGC TGAAAGAACA GCCCGATAAC GCGCTTCTGC 

1001 TGATGTATCT CGGTCGGCTC GCCTACGGCC GCAAACTTTG GGGCAAGGCA 

1051 AAAGGCTACC TTGAAGCGAG CATTGCATTA AAGCCGAGTA TTTCCGCGCG 

1101 TTTGGTTCTA GCAAAGGTTT TCGACGAAAT CGGAGAACCG CAGAAGGCGG 

1151 AGGCGCAGCG CAACTTGGTT TTGGAAGCCG TCTCCGATGA CGAACGTCAC 

1201 GCAGCGTTAG AGCAGCATAG CTGA 

This corresponds to the amino acid sequence <SEQ ID 752; ORF100-1>: 



1 MKTWWIVVL FAAAVGLALA SGIYTGDVYI VLGQTMLRIN LHAFVLGSLI 
51 AWVWYFLFK FIIGV LNIPE KMQRFGSARK GRKAALALNK AGLAYFEGRF 
101 EKAELEASRV LVNKEAGDNR TLALMLGAHA AGQMENIELR DRYLAEIAKL 
151 PEKQQLSRYL LLAESALNRR DYEAAEANLH AAAKMNANLT RLVRLQLRYA 
201 FDRGDALQVL AKTEKLSKAG ALGKSEMERY QNWAYRRQLA DAADAAALKT 
251 CLKRIPDSLK NGELSVSVAE KYERLGLYAD AVKWVKQHYP HNRRPELLEA 
301 FVE SVRFLGE REQQKAIDFA DAWLKEQPDN ALLLMYLGRL AYGRKLWGKA 
351 KGYLEASIAL KPSISARLVL AKVFDEIGEP QKAEAQRNLV LEAVS DDERH 
4 01 AALEQHS* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORE from N. meningitidis (strain A) 

ORF100 shows 93.5% identity over a 386aa overlap with an ORF (ORFlOOa) from strain A of 
meningitidis: 



10 20 30 40 50 60 

orflOO.pep MKTWWIWLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAWVWYFLFK 
I I I I I I I I I I M I I I I I 1 I I I I I I I I M I I I M I I I I I I I I I I I I I I I I I | | | | | | | | 
orflOOa MKTVVWIWLFAAAXGLALASGIXTGDVYIVLGQTMLRINLHAFVLGSLIAWVWYFLFK 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 100 .pep FIIGVLNIPEKMQRFGSARKGXKXXLALNKAGLAYFEGRFEKAELEASRVLVNKVGRDNR 
I I I I 1 I I I I I I I I I I I I I I I I M I I I I I I I I I M III I I I I M I I i I II : ill 
orf 100a FIIGVLNXPEKMQRFGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLGNKEAGDNR 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 1 0 0 . pep TLALMLXAHAAGQMENIXXRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 

, lnn I I I I I I || I I || I II I I I I II I I I I I II I II 

orf 1 00a TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 
130 140 150 160 170 180 

190 200 210 220 230 240 
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orflOO pep AAAKMNANLTRLVRLXIRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQLA 

| M | | | | | | | | M I I : I I I I I I I I I I 1 I II I 1 I I I I I I I I I I I I I I 1 1 I I I I 1 I M 
orflOOa AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKXSKAGAXGKSEMERYQNWAYRRQLX 
190 200 210 220 230 240 

250 260 270 280 290 300 

orflOO pep DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPXNRRPELLEA 
| | | | I I I I I 1 I I II I I I I I I I I I I I I I I I I M I I I I I I I 1 I I I I I I I I I I I I I I I I I I I 
o r f 1 0 Oa DAADAAALKTCLKRI PD SLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 

250 260 270 280 290 300 

310 320 330 340 350 360 

orflOO pep FVE SVRFLGEREQQKAI DFADAWLKEQPDNALLLMYLGRLAFGRKLWGKAKGYLEAS I AL 

| | I I I I I I I I I : I I I I I i I I I I I M I I I I I I I I I 111111:11 I I I I I I I I 

orflOOa FVESVRFLGERDQQKAIDFADAWLKEQPDNALLLXYLGRLAYGRKLWGKAKGYLEASIAL 

310 320 330 340 350 360 

370 380 
orflOO. pep KPSISARLVLTKVFDEIGEPQKAEAH 
I I I I I I I I I I : I I I I I 1 I I I I M I : 
orflOOa KPSISARLVLAKVFDETGEPQKAEAQRNLVLASVAEENRPSAETHX 

370 380 390 400 

The complete length ORFlOOa nucleotide sequence <SEQ ID 753> is: 

1 ATGAAAACGG TAGTCTGGAT TGTCGTCCTG TTTGCCGCCG CNNTCGGGCT 

51 GGCATTGGCG TCGGGCATTN ACACCGGCGA CGTGTATATC GT ACT CGGAC 

101 AGACCAT GCT CAGAATCAAC CTGCACGCCT TTGTGTTAGG TTCGCTGATT 

151 GCCGTCGTGG TGTGGTATTT CCTGTTCAAA TTCATCATCG GCGTACTCAA 

2 01 TANCCCCGAA AAGATGCAGC GTTTCGGTTC GGCGCGTAAA GGCCGCAAGG 

2 51 CCGCGCTTGC TTTGAACAAG GCGGGTTTGG CGTATTTTGA AGGGCGTTTT 

301 GAAAAGGCGG AACTTGAAGC CTCGCGCGTA TTGGGAAACA AAGAGGCGGG 

351 GGATAACCGG ACTTTGGCAT TGATGTTGGG CGCACATGCC GCCGGGCAGA 

4 01 TGGAAAACAT CGAGCTGCGC GACCGTTATC TTGCGGAAAT CGCCAAACTG 

4 51 CCGGAAAAGC AGCAGCTTTC CCGTTATCTT TTGTTGGCGG AATCGGCGTT 

501 GAAC CGGCGC GATTACGAAG CGGCGGAAGC CAATCTT CAT GCGGCGGCGA 

551 AGATGAATGC CAACCTTACG CGCCTCGTGC GTCTGCAACT TCGTTACGCT 

601 TTCGACAGGG GCGACGCGTT GCAGGTTCTG GCAAAAACCG AAAAANTTTC 

651 CAAGGCGGGC GCGTNGGGCA AATCGGAAAT GGAACGGTAT CAAAATTGGG 

701 CATACCGCCG CCAGCTGNCG GATGCTGCCG ATGCCGCCGC TTTGAAAACC 

751 TGCCTGAAGC GGATTCCCGA CAGCCTCAAA AACGGGGAAT TGAGCGTATC 

8 01 GGTTGCGGAA AAGTACGAAC GTTTGGGACT GTATGCCGAT GCGGTCAAAT 

851 GGGTCAAACA GCATTATCCG CACAACCGCC GACCCGAACT TTTGGAAGCN 

901 TTTGTCGAAA GCGTGCGCTT TTTGGGCGAA CGCGATCAGC AGAAAGCCAT 

951 CGATTTTGCC GATGCTTGGC T GAAAGAAC A GCCCGATAAT GCGCTTCTGC 

1001 TGANGTATCT CGGTCGGCTC GCCTACGGCC GCAAACTTTG GGGCAAGGCA 

1051 AAAGGCTACC TTGAAGCGAG CAT T G CAT T A AAGCCGAGTA TTTCCGCGCG 

1101 TTTGGTTCTG GCAAAGGTTT TTGACGAAAC CGGAGAACCG CAGAAGGCGG 

1151 AGGCGCAGCG CAACTTGGTT TTGGCAAGCG TTGCCGAGGA AAACCGNCCT 

1201 TCCGCCGAAA CCCATTGA 

This encodes a protein having amino acid sequence <SEQ ID 754>: 



1 MKTWWIVVL FAAAXGLALA SGIXTGDVYI VLGQTMLRIN LHAFVLGSLI 

51 AWVWYFLFK FIIGV LNXPE KMQRFGSARK GRKAALALNK AGLAYFEGRF 

101 EKAELEASRV LGNKEAGDNR TLALMLGAHA AGQMENIELR DRYLAE I AKL 

151 PEKQQLSRYL LLAESALNRR DYEAAEANLH AAAKMNANLT RLVRLQLRYA 

201 FDRGDALQVL AKTEKXSKAG AXGKSEMERY QNWAYRRQLX DAADAAALKT 

251 CLKRIPDSLK NGELSVSVAE KYERLGLYAD AVKWVKQHYP HNRRPELLEA 

301 FVE SVRFLGE RDQQKAIDFA DAWLKEQPDN ALLLXYLGRL AYGRKLWGKA 

351 KGYLEASIAL KPSISARLVL AKVFDETGEP QKAEAQRNLV LASVAEENRP 

401 SAETH* 

ORFlOOa and ORF 100-1 show 95.1% identity in 406 aa overlap: 



10 20 30 40 50 60 

orflOOa. pep MKTVVWIVVLFAAAXGLALASGIXTGDVYIVLGQTMLRINLHAFVLGSLIAVWWYFLFK 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I II II II II 
or f 100-1 MKTVVWIWLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAVWWYFLFK 
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10 



20 



60 



orf 100a. pep 



70 80 90 100 110 120 

FIIGVLNXPEKMQRFGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLGNKEAGDNR 

| | | | | | | j| [ | | | I I I I I I I II 1 I I I I I 1 1 I I I I M I 1 I M I I I I I II II I I I I I I II 
FIIGVLNIPEKMQRFGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLVNKEAGDNR 
70 80 90 100 110 120 

130 140 150 160 170 180 

T LALMLGAHAAGQMEN I E LRDRYL AE I AKL PEKQQLSRYLLLAE S ALNRRD YE AAE ANLH 
| | | | | | | | I I I I I I I I I I i I I M I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I i I I I I M I 
TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 

130 140 150 160 170 180 

190 200 210 220 230 240 

AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKXSKAGAXGKSEMERYQNWAYRRQLX 

| | || | | | | | I I I I I I I I I 1 I I I I I I M I II 1 I II I Mill I I I I I I II I I I I M I I I 
AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQLA 
190 200 210 220 230 240 

250 260 270 280 290 300 

DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 

| | | | | | || | I I I I I I I I I I I I I I II I II I I I I i I I I I II I I I I I I I M I I I I I I I I I I I I 
DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 
250 260 270 280 290 300 

310 320 330 340 350 360 

FVESVRFLGERDQQKAIDFADAWLKEQPDNALLLXYLGRLAYGRKLWGKAKGYLEASIAL 

I | | | | I I I M I : I I I I II I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
FVESVRFLGEREQQKAI DFADAWLKEQPDNALLLMYLGRLAYGRKLWGKAKGYLEAS IAL 
310 320 330 340 350 360 

370 380 390 400 

KPS I SARLVLAKVFDETGEPQKAEAQRNLVLASVAEENRPSA-ETHX 

K P S I S ARLVLAKV F DE I GE PQKAE AQRNLVLE AV S DDE RHAALE QH S X 
370 380 390 400 



Homology with a predicted ORF from N.gonorrhoeae 
40 ORF100 shows 93.3% identity over a 386 aa overlap with a predicted ORF (ORFlOOng) from 



N.gonorrhoeae: 

orf 100 .pep 
orflOOng 
orflOO .pep 
orflOOng 
orflOO .pep 
orflOOng 
orf 100. pep 
orflOOng 
orflOO .pep 
orflOOng 
orflOO. pep 
orflOOng 



MKTWWIWLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAVWWYFLFK 60 
I I I I I I [ I I I I I II I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I M I I I I I I I 
MKTVVWI WLFAAAVGLALASG I YTGDVYI VLGQTMLRINLHAFVLG SL IAVWWYFLFK 6 0 

FIIGVLNIPEKMQRFGSARKGXKXXLALNKAGLAYFEGRFEKAELEASRVLVNKVGRDNR 12 0 
1111111111:1:1 I I I I M I 1 I I I I I I I I I I I I I I I I I I I I I I I M II : III 

FIIGVLNI PENMRRSGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLGNKEAGDNR 120 

TLALMLXAHAAGQMENIXXRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 180 

I I I I I I I I I I I I I I I I I II I I II I M I I I II I I I I I I I I I I I I I I I I I M II I I I II 

T LALMLGAHAAGQMEN IE LRDRYLAE I AKL PEKQQLS RYLL LAE S ALNRRD YE AAE ANLH 180 

AAAKMNANLTRLVRLXIRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQLA 2 40 

II I I I I I I ! I I I I I I : I I I I I II I I I I I I I I I I I I II II II I I I I I I I I I II II I I I : I 

AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQMA 2 4 0 

DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPXNRRPELLEA 300 
II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

DAADAAALKT CLKRI P DS LKNGEL S V S VAEKYERLGLYADAVKWVKQHY PHNRRPE LLE A 3 00 

FVESVRFLGEREQQKAIDFADAWLKEQPDNALLLMYLGRLAFGRKLWGKAKGYLEASIAL 3 60 
I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I 1 I I I I I I I : I I II I I I I II I I I I I I I I 

FVESVRFLGEREQQKAIDFADSWLKEQPDNALLLMYLGRLAYGRKLWGKAKGYLEAS IAL 3 60 
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or f 100. pep KPS ISARLVLTKVFDE IGEPQKAEAH 38 6 

I I I I I I I I I : I I I M I I I I 1 = 

orflOOng RPSIPARLVLAKVFDETAQSQKAEAQRNLVLASVAGENRPSAETR 405 

The complete length ORFlOOng nucleotide sequence <SEQ ID 755> is: 

5 1 ATGAAAACGG TAGTCTGGAT TGTTGTCCTG TTTGCCGCCG CCGTCGGACT 

51 GGCGCTGGCT TCGGGCATTT ACACCGGCGA CGTGTATATC GTACTCGGAC 

101 AGACCATGCT CAGAATCAAC CTGCACGCCT TTGTGTTAGG TTCGCTGATT 

151 GCCGTCGTGG TGTGGTATTT CCTGTTTAAA TTCATCATCG GCGTACTCAA 

201 TAT CCCCGAA AATATGCGGC GTTCCGGTTC GGCGCGGAAA GGCCGCAAGG 

JO 251 CCGCGCTTGC CTTGAATAAG GCGGGTTTGG CGTATTTCGA AGGGCGTTTT 

301 GAAAAGGCGG AACTCGAAGC CTCTCGAGTG TTGGGCAACA AAGAGGCCGG 

351 AGACAACCGG ACTTTGGCAT TGATGCTGGG CGCGCACGCG GCAGGACAGA 

401 TGGAAAATAT CGAGCTGCGC GACCGTTATC TTGCGGAAAT CGCCAAACTG 

451 CCGGAAAAAC AGCAGCTTTC CCGCTATCTT CTGCTGGCGG AATCGGCGTT 

15 501 AAACCGGCGC GATTACGAAG CGGCGGAAGC CAATCTTCAT GCGGCGGCGA 

551 AGATGAATGC CAACCTTACG CGCCTCGTGC GTCTGCAACT TCGTTACGCC 

601 TTCGATCGGG GCGATGCGTT GCAGGTTCTG GCAAAAaccG AAAAACTTTC 

651 CAAGGCGGGC GCGTTGGGCA AATCGGAAAT GGAACGGTAT CAAAATTGGG 

7 01 CATACCGCCG CCAGATGGCG GATGCTGCCG ATGCCGCCGC TTTGAAAACC 

20 751 TGCCTGAAGC GGATTCCCGA CAGCCTCAAA AACGGGGAAT TGagcGTATC 

801 GGTTGCGGAA AAGTACGAAC GTTTGGGACT GTATGCCGAT GCGGTCAAAT 

851 GGGTCAAACA GCATTATCCG CACAACCGCC GCCCCGAGCT TTTGGAAGCC 

901 TTTGTCGAAA GCGTGCGCTT TTTGGGCGAG CGCGAACAGC AGAAAGCCAT 

951 CGATTTTGCC GATTCTTGGC TGAAAGAACA GCCCGATAAC GCGCTTCTGC 

25 1001 TGATGTATCT CGGCCGGCTC GCCTACGGCC GCAAACTTTG GGGTAAGGCA 

1051 AAAGGCTACC TTGAAGCGAG TATTGCACTG AAGCCGAGTA TTCCGGCGCG 

1101 TTTGGTGTTG GCAAAGGTTT TTGACGAAAC CGCACAGTCG CAAAAAGCCG 

1151 AAGCACAGCG CAACTTGGTT TTGGCAAGCG TTGCCGGGGA AAACCGCCCT 

1201 TCCGCCGAAA CCCGTTGA 

30 This encodes a protein having amino acid sequence <SEQ ID 756>: 

1 MKTVVWIVVL FAAAVG L ALA SGIYTGDVYI VLGQTMLRIN LHAFVLGSLI 

51 AVVVWYFLFK FIIGV LNIPE NMRRSGSARK GRKAALALNK AGLAYFEGRF 

101 EKAELEASRV LGNKEAGDNR TLALMLGAHA AGQMENIELR DRYLAEIAKL 

151 PEKQQLSRYL LLAESALNRR DYEAAEANLH AAAKMNANLT RLVRLQLRYA 

35 201 FDRGDALQVL AKTEKLSKAG ALGKSEMERY QNWAYRRQMA DAADAAALKT 

251 CLKRIPDSLK NGELSVSVAE KYERLGLYAD AVKWVKQHYP HNRRPELLEA 

3 01 FVESVRFLGE REQQKAIDFA DSWLKEQPDN ALLLMYLGRL AYGRKLWGKA 
351 KGYLEASIAL KPSIPARLVL AKVFDETAQS QKAEAQRNLV LASVAGENRP 

4 01 SAETR* 

40 ORFlOOng and ORF100-1 show 95.3% identity in 402 aa overlap: 

10 20 30 40 50 60 

orf 100-1 . pep MKTWWIWLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAWWYFLFK 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I 1 
orflOOng mKTVVWIWLFAAAVGLALASGIYTGDVYIVLGQTMLRINLHAFVLGSLIAVVVWYFLFK 
45 10 20 30 40 50 60 

70 80 90 100 110 120 

orf 100-1 . pep FIIGVLNIPEKMQRFGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLVNKEAGDNR 

I I I I j i 1 I I I : I : I I I 1 II I I ! I I I I I I II I I I I I I 1 I 1 I 1 I I I M I M I I I I I I II I 
50 orflOOng FIIGVLNIPENMRRSGSARKGRKAALALNKAGLAYFEGRFEKAELEASRVLGNKEAGDNR 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 1 00-1 . pep TLALMLGAHAAGQMENIELRDRYLAEIAKLPEKQQLSRYLLLAESALNRRDYEAAEANLH 
55 I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M 

orflOOng T LALMLGAHAAGQMENIE LRDRYLAE I AKL PEKQQL SRYLLLAE S ALNRRDYE AAEANLH 

130 140 150 160 170 180 

190 200 210 220 230 240 

60 orf 100-1 . pep AAAKMNANLTRLVRLQLRYAFDRGDALQVLAKTEKLSKAGALGKSEMERYQNWAYRRQLA 
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250 260 270 280 290 300 

orf 1 00-1 oep DAADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 

| | | | | | | ] | | | | | | | M I I I I I M I 1 I! I I I I I I I I I M I II I I I I I I i I 1 I 

nrfl nOna DA ADAAALKTCLKRIPDSLKNGELSVSVAEKYERLGLYADAVKWVKQHYPHNRRPELLEA 
° g 250 260 270 280 290 300 

310 320 330 340 350 360 

orf 100-1 pep FVESVRFLGEREQQKAIDFADAWLKEQPDNALLLMYLGRLAYGRKLWGKAKGYLEASIAL 

|| || || INI II II II Ml Ihl II M II II II I II M I HIM 

orf 10 Oner FVESVRFLGEREQQKAIDFADSWLKEQPDNALLLMYLGRLAYGRKLWGKAKGYLEASIAL 

310 320 330 340 350 360 

370 380 390 400 

orfl00-l pep KPS I S ARLVLAKV FDE I GE PQKAEAQRNLVLEAV S DDERHAALEQH SX 

I I I I I I I I I I I II II : : I II II I M I II : I : : : I : I 
orf 10 On KPSIPARLVLAKVFDETAQSQKAEAQRNLVLASVAGENRPSAETRX 

370 380 390 400 

Based on this analysis, including the presence of a putative leader sequence, a putative 
transmembrane domain, and a RGD motif, it is predicted that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

Example 90 

The following DNA sequence, believed to be complete, was identified in ~N. meningitidis <SEQ ID 
757> 

1 ATGATGTTTT CTTGGTTCAA GCTGTTTCAC TTGTTTTTTG TCATTTCGTG 

51 GTTTGCAGGG CTGTTTTACC TGCCGAGGAT TTTCGTCAAT ATGGCGATGA 

101 TTGATGTGCC GCGCGGCAAT CCCGAGTATG TGCGTCTGTC GGGCATGGCG 

151 GTGCGGCTGT ACCGTTTTAT GTCGCCGTTG GGCTTCGGCG CGGTCGTGTT 

201 CGGCGCGGCG ATACCGTTTG CCGCCGGCTG GTGGGGCAGC GGCT GGGT AC 

251 ACGTCAAACT GTGTTTGGGC TTGATGCTCT TGGCTTACCA GTTGTATTGC 

301 GGCGTGCTGC TGCGCCGTTT TCAGGATTAC AGCAATGCTT TTTCACACCG 

351 CTGGTACCGC GTGTTCAACG AAATCCCCGT GCTGCTGATG GTTGCCGCGC 

4 01 TGTATsTGGT CGTGTTCAAA CCGTTTTGA 

This corresponds to the amino acid sequence <SEQ ID 758; ORF102>: 

1 MMFSWFKLFH LFFVISWFAG LFYLPRIFVN MAMIDVPRGN PEYVRLSGMA 
51 VRLYRFMSPL GFGAVVFGAA IPFAAGWWGS GWVHVKLCLG LMLLAYQLYC 
101 GVLLRRFQDY SNAFSHRWYR VFNEIPVLLM VAALYXVVFK PF* 

Further work revealed the complete nucleotide sequence <SEQ ID 759>: 

1 ATGATGTTTT CTTGGTTCAA GCTGTTTCAC TTGTTTTTTG TCATTTCGTG 

51 GTTTGCAGGG CTGTTTTACC TGCCGAGGAT TTTCGTCAAT ATGGCGATGA 

101 TTGATGTGCC GCGCGGCAAT CCCGAGTATG TGCGTCTGTC GGGCATGGCG 

151 GTGCGGCTGT ACCGTTTTAT GTCGCCGTTG GGCTTCGGCG CGGTCGTGTT 

201 CGGCGCGGCG ATACCGTTTG CCGCCGGCTG GTGGGGCAGC GGCTGGGTAC 

251 ACGTCAAACT GTGTTTGGGC TTGATGCTCT TGGCTTACCA GTTGTATTGC 

301 GGCGTGCTGC TGCGCCGTTT TCAGGATTAC AGCAATGCTT TTTCACACCG 

351 CTGGTACCGC GTGTTCAACG AAATCCCCGT GCTGCTGATG GTTGCCGCGC 

4 01 TGTATCTGGT CGTGTTCAAA CCGTTTTGA 

This corresponds to the amino acid sequence <SEQ ID 760; ORF102-1>: 

1 MMFSWFKLFH LFFVISWFAG LFYLPRIFVN MA MIDVPRGN PEYVRLSGMA 
51 VRLYRFMSP L GFGAVVFGAA IPFAAG WWGS GWVHVK LCLG LMLLAYQLYC 
101 GVL LRRFQDY SNAFSHRWYR VFNE IPVLLM VAALYLWFK P F* 
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Computer analysis of this amino acid sequence gave the following results: 

Homology with HP 1484 hypothetical integral membrane protein of H. pylori (accession number AE000647) 
ORF102 and HP 1484 show 33% aa identity in 143aa overlap: 

orfl02 3 FSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPLGF 62 

F W K FH+ VI SW A LFYLPR+FV A + V++ +LY F++ 

HP1484 8 FLWVKAFHVIAVISWMAALFYLPRLFVYHAENAHKKEFVGWQIQEK— KLYSFIASPAM 65 

orfl02 63 GAWFGAAI P FAAG WWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWY 119 

G + + + GW+H KL L ++LLAY YC +R + + R+Y 

HP1484 66 GFTLITGILMLLIEPTLFKSGGWLHAKLALWLLLAYHFYCKKCMRELEKDPTRRNARFY 125 

orfl02 120 RVFNEIPXXXXXXXXXXXXFKPF 142 

RVFNE P KPF 
HP1484 126 RVFNEAPTILMILIVILWVKPF 148 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF102 shows 99.3% identity over a 142aa overlap with an ORF (ORF102a) from strain A ofN. 
meningitidis: 

10 20 30 40 50 60 

orf 102 . pep MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 
M M I I I I I I I I I I I I I I I I I II I I i I [ I I I I i I I I | | M | | I | | | | | | | | | | | | | M | | 
orf 102a MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 102 . pep GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 

I I I I I I I I I I I I I I I I I I I I M I M I I I I I M M M I I I I I I I I I ! I I I II II I I N I I I 

o r f 1 0 2 a GFGAWFGAAI P FAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 

70 80 90 100 110 120 



The complete length ORF 102a nucleotide sequence <SEQ ID 761 > is: 

1 ATGATGTTTT CTTGGTTCAA GCTGTTTCAC TTGTTTTTTG TCATTTCGTG 

51 GTTTGCAGGG CTGTTTTACC TGCCGAGGAT TTTCGTCAAT ATGGCGATGA 

101 TTGATGTGCC GCGCGGCAAT CCCGAGTATG TGCGTCTGTC GGGCATGGCG 

151 GTGCGGCTGT ACCGTTTTAT GTCGCCGTTG GGCTTCGGCG CGGTCGTGTT 

201 CGGCGCGGCG ATACCGTTTG CCGCCGGCTG GTGGGGCAGC GGCTGGGTAC 

251 ACGTCAAACT GTGTTTGGGC TTGATGCTCT TGGCTTACCA GTTGTATTGC 

301 GGCGTGCTGC TGCGCCGTTT TCAGGATTAC AGCAATGCTT TTTCACACCG 

351 CTGGTACCGC GTGTTCAACG AAATCCCCGT GCTGCTGATG GTTGCCGCGC 

401 TGTATCTGGT CGTGTTCAAA CCGTTTTGA 

This encodes a protein having amino acid sequence <SEQ ID 762>: 

1 MMFSWFKLFH LFFVISWFAG LFYLPRIFVN MA MIDVPRGN PEYVRLSGMA 
51 VRLYRFMSP L GFGAWFGAA IPFAAG WWGS GWVHV KLCLG LMLLAYQLYC 
101 GVLLRRFQDY SNAFSHRWYR VFNE I PVLLM VAALYLWFK P F* 

ORF 102a and ORF 102-1 show complete identity in 142 aa overlap: 

10 20 30 40 50 60 

orf 102a. pep MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 
I I f I I I I I f I I I I I I | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | j j j j j j j 
orf 102-1 MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 
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orf 1 02a DeD GFGAWFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 
orliuza.p p | ,,,,,,,, i i ,, l ,,,, | | i i i i | | | | i | i ! I | l | | | l | l | I I I I I I I I 



130 140 
orf 102a. pep VFNEI PVLLMVAALYLVVFKPFX 
10 I I I I I I I I i I I I I I I I I I 1 I I I I 

orfl02-l VFNEI PVLLMVAALYLVVFKPFX 

130 140 

Homology with a predicted ORF from N. gonorrhoeae 
1 5 ORF102 shows 97.9% identity over a 142 aa overlap with a predicted ORF (ORF102ng) from N. 
gonorrhoeae: 

orfl02 pep MMFSWFKLFHLFFVISWFAGLFYLPRIFWMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 60 

I M I I I M I I I I I I I I I I I I I II I I I I I M I I I I i •■ I I I I I I I I M I I II I I I 1 I I I I I I 

orfl02ng mmfSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDAPRGNPEYVRLSGMAVRLYRFMSPL 60 

20 

orfl02 pep GFGAVVFGAAIPFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 120 

I I I I I I I I I I I I II I I I I I I I I I I I I I I I M I II I I I I I I I I I I I II I I I I M I I I I I I 

orfl02ng GFGAWFGAAIPFAAGRWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 120 

25 orf 102. pep VFNEIPVLLMVAALYXWFKPF 142 

1 1 1 1 1 1 1 II II 1 1 1 1 1 1 1 1 1 1 

orfl02ng VFNEIPVLLMVAALYLWFKPF 142 

The complete length ORF102ng nucleotide sequence <SEQ ID 763> is: 

1 ATGATGTTTT cttggttcaa gctgtttcac ttgttttttg tcatttcgtg 

30 51 gtttgcaggg ctgttttacc tgccgaggat tttcgtcaat atggcgatga 

101 ttgatgcgcc gcgcggcaat cccgagtatg tgcgcctgtc ggggatggcg 

151 gtgcggttgt accgttttat gtcgcctttg ggtttcggcg cggtcgtgtt 

201 CGGCGCGGCG ATACCGTTTG CCGCcggccg GTGGGGCagc ggctggGTTC 

251 ACGTCAAACT GTGTTTGGGC TTGATGCTCT tggcttatca gttgtattgc 

35 301 ggcgtgctgc tgcgccgttt tcaggattac agcaatgctt tttcacaccg 

351 ctggtaccgc gtgttcaacg aaatccccgt gctgctgatg gttgccgcgc 

4 01 tgtatctggt cgtgttcaaa ccgttttga 

This encodes a protein having amino acid sequence <SEQ ID 764>: 

1 MMFSWFKLFH LFFVISWFAG LFYLPRIFVM MAM IDAPRGN PEYVRLSGMA 
40 51 VRLYRFMSP L GFGAWFGAA IPFAAG RWGS GWVHVK LCLG LMLLAYQLYC 

101 GVL LRRFQDY SNAFSHRWYR VFNE IPVLLM VAALYLWFK P F* 

ORF102ng and ORF102-1 show 98.6% identity in 142 aa overlap: 

10 20 30 40 50 60 

orf 102-1 . pep MMFSWFKLFHLFFVISWFAGLFYLPRIFTOMAMIDVPRGNPEYVRLSGMAVRLYRFMSPL 

45 I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I M : I I I I I I II I M I I I I I I I I I I I ! I 

orfl02ng MMFSWFKLFHLFFVISWFAGLFYLPRIFVNMAMIDAPRGNPEYVRLSGMAVRLYRFMSPL 
10 20 30 40 50 60 

70 80 90 100 110 120 

50 orf 1 02 -1 . pep GFGAVVFGAAI PFAAGWWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 

1 1 1 1 1 1 1 1 1 11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 

orfl02ng GFGAWFGAAIPFAAGRWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFSHRWYR 
70 80 90 100 110 120 

55 130 140 

orf 102-1 .pep 
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In addition, ORF102ng shows significant homology to a membrane protein from H.pylori: 

gi | 2314656 (AE000647) conserved hypothetical integral membrane protein 
[Helicobacter pylori] Length = 148 
Score = 79.2 bits (192), Expect = le-14 

Identities = 50/147 (34%), Positives = 68/147 (46%), Gaps = 13/147 (8%) 

Query: 3 FSWFKLFHLFFVISWFAGLFYLPRIEVNMAMIDAPRGNPEYVRLSGMAVRLYRFMSPLGF 62 

F W K FH+ VI SW A LFYLPR+FV A + V++ +LY F++ 

Sbjct: 8 FLWVKAFHVIAVISWMAALFYLPRLFVYHAENAHKKEFVGWQIQEK— KLYSFIASPAM 65 

Query: 63 GAWFGAAIP FAAGRWGSGWVHVKLCLGLMLLAYQLYCGVLLRRFQDYSNAFS 115 

G + + F +G GW+H KL L ++LLAY YC +R + + 
Sbjct: 66 GFTLITGILMLLIEPTLFKSG GWLHAKLALWLLLAYHFYCKKCMRELEKDPTRRN 121 

Query: 116 HRWYRVFNEIPXXXXXXXXXXXXFKPF 142 

R+YRVFNE P KPF 
Sbjct: 122 ARFYRVFNEAPTILMILIVILWVKPF 148 

Based on this analysis, it is predicted that these proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 91 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 765>: 

1 ATGGCAAAAA TGATGAAATG GGCGGCTGTT GCGGCGGTCG CGGCGGCAGC 

51 GGTTTGGGGC GGATGGTCTT AACTGAAGCC CGAGCCGCAC GTGCTTGATA 

101 TTACGGAAAC GGTCAGGCGC GGC // 

//.. ATTTCGTTTA CGATTTTGTC CGAACCGGAT ACGCCGATTA AGGCGAAGCT 

51 CGACAGCGTC GACCCCGGGC TGACCACGAT GTCGTCGGGC GGTTACAACA 

101 GCAGTACGGA TACGGCTTCC AATGCGGTCT ACTATTATGC CCGTTCGTTT 

151 GTGCCGAATC CGGACGGCAA ACTCGCCACG GGGATGACGA CGCAGAATAC 

201 GGTTGAAATC GACGGCGTGA AAAATGTGCT GATTATTCCG TCGCTGACCG 

251 TGAAAAATCG CGGCGGCAAG GCGTTTGTGC GCGTGTTGGG TGCGGACGGC 

301 AAGGCGGCGG AACGCGAAAT CCGGACCGGT ATGAGAGACA GTATGAATAC 

351 CGAAGTAAAA AGCGGGTTGA AAGAGGGGGA CAAAGTGGTC ATCTCCGAAA 

401 TAACCGCCGC CGAGCAACAG GAAAGCGGCG AACGCGCCCT AGGCGGCCCG 

451 CCGCGCCGAT AA 

This corresponds to the amino acid sequence <SEQ ID 766; ORF85>: 



1 MAKMMKWAAV AAVAAA AVWG GWS.LKPEPH VLDITETVRR G 

51 

101 

151 

201 I SFTILSEiPDT 

251 PIKAKLDSVD PGLTTMSSGG YNSSTDTASN AVYYYARSFV PNPDGKLATG 

301 MTTQNTVEID GVKNVLIIPS LTVKNRGGKA FVRVLGADGK AAEREIRTGM 

351 RDSMNTEVKS GLKEGDKVVI SEITAAEQQE SGERALGGPP RR* 

Further work revealed the further partial nucleotide sequence <SEQ ID 767>: 

1 . . GTATCGGTCG GCGCGCAGGC ATCGGGGCAG ATTAAGATAC TTTATGTCAA 
51 ACTCGGGCAA CAGGTTAAAA AGGGCGATTT GATTGCGGAA ATCAATTCGA 
101 CCTCGCAGAC CAATACGCTC AATACGGAAA AATCCAAGTT GGAAACGTAT 
151 CAGGCGAAGC TGGTGTCGGC ACAGATTGCA TTGGGCAGCG CGGAGAAGAA 

201 ATATAAGCGT CAGGCGGCGT TATGGAAGGA AAACGCGACT TCCAAAGAGG 
251 ATTTGGAAAG CGCGCAGGAT GCGTTTGCCG CCGCCAAAGC CAATGTTGCC 
301 GAGCTGAAGG CTTTAATCAG ACAGAGCAAA ATTTCCATCA ATACCGCCGA 

351 GTCGGAATTG GGCTACACGC GCATTACCGC AACGATGGAC GGCACGGTGG 
4 01 TGGCGATTCT CGTGGAAGAG GGGCAGACTG TGAACGCGGC GCAGTCTACG 
4 51 CCGACGATTG TCCAATTGGC GAATCTGGAT ATGATGTTGA ACAAAATGCA 

501 GATTGCCGAG GGCGAT AT TA CCAAGGTGAA GGCGGGGCAG GATATTTCGT 
551 TTACGATTTT GTCCGAACCG GATACGCCGA TTAAGGCGAA GCTCGACAGC 
601 GTCGACCCCG GGCTGACCAC GATGTCGTCG GGCGGTTACA ACAGCAGTAC 
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GGATACGGCT TCCAATGCGG TCTACTATTA TGCCCGTTCG TTTGTGCCGA 
ATCCGGACGG CAAACTCGCC ACGGGGATGA CGACGCAGAA TACGGTTGAA 
ATCGACGGCG TGAAAAATGT GCTGATTATT CCGTCGCTGA CCGTGAAAAA 
TCGCGGCGGC AAGGCGTTTG TGCGCGTGTT GGGTGCGGAC GGCAAGGCGG 
CGGAACGCGA AATCCGGACC GGTATGAGAG ACAGT AT GAA TACCGAAGTA 
AAAAGCGGGT TGAAAGAGGG GGACAAAGTG GTCATCTCCG AAATAACCGC 
CGCCGAGCAA CAGGAAAGCG GCGAACGCGC CCTAGGCGGC CCGCCGCGCC 
GATAA 

This corresponds to the amino acid sequence <SEQ ID 768; ORF85-l>: 

JO 1 VSVGAQASGQ IKILYVKLGQ QVKKGDLIAE INSTSQTNTL NTEKSKLETY 

51 QAKLVSAQIA LGSAEKKYKR QAALWKENAT SKEDLESAQD A F AAAKAN V A 

101 ELKALIRQSK ISINTAESEL GYTRITATMD GTWAI LVEE GQTVNAAQST 

151 PTIVQLANLD MMLNKMQIAE GDITKVKAGQ DISFZTLSEP DTPIKAKLDS 

201 VDPGLTTMSS GGYNSSTDTA SNAVYYYARS FVPNPDGKLA TGMTTQNTVE 

15 251 IDGVKNVLII PSLTVKNRGG KAFVRVLGAD GKAAERE I RT GMRDSMNTEV 

301 KSGLKEGDKV VISEITAAEQ QESGERALGG PPRR* 

Computer analysis of this amino acid sequence gave the following results: 



651 
701 
751 
801 
851 
901 
951 
1001 



Homology with a predicted QRF from N. meningitidis (strain A) 

ORF85 shows 87.8% identity over a 41aa overlap and 99.3% identity over a 153aa overlap with 
20 an ORF (ORF85a) from strain A of N. meningitidis: 



MAKMMKWAAVAAVAAAAVWGGWS-LKPEPHVLDITETVRRG 
I I I I I I I I I I I I I I I ! I I I I I ! I Mill:: I I I 1 M I I 

MAKMMKWAAVAAVAAAAVWGGWSYLKPEPQAAYITETVRRGDISRTVSATGEISPSNLVS 



30 or f 85a TIVQLANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSSG 

210 220 230 240 250 260 

110 120 130 140 150 160 

orf85.pep GYNSSTDTASNAVYYYARSFVPNPDGKLATGMTTQ1SITVEIDGVKNVLIIPSLTVKNRGGK 
35 I I I I I I I I I I I I I M I I I I I I I I II I I i I I I I I ! I I I ! I I I I I I I I I I I I I I I I II I I I : 

or f 85a GYNSSTDTASNAVYYYARSFVPNPDGKLATGMTTQNTVEIDGVKNVLIIPSLTVKNRGGR 
270 280 290 300 310 320 

170 180 190 200 210 220 

40 orf85.pep afvrvlgadgkaaereirtgmrdsmntevksglkegdkwiseitaaeqqesgeralggp 

1 I I I I I i I I M I I I I i II I I I I II I I I II I I I I I I I I I I I I I M M I I I I I I I I I I I I i I 

orf8 5a AFVRVLGADGKAAEREIRTGMRDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGGP 
330 340 350 360 370 380 

45 230 

or f 8 5. pep PRRX 

or f 8 5a PRRX 
390 

50 The complete length ORF85a nucleotide sequence <SEQ ID 769> is: 

1 ATGGCAAAAA TGATGAAATG GGCGGCTGTT GCGGCGGTCG CGGCGGCAGC 

51 GGTTTGGGGC GGATGGTCTT ATCTGAAGCC CGAGCCGCAG GCTGCTTATA 

101 TTACGGAAAC GGTCAGGCGC GGCGACATCA GCCGGACGGT TTCTGCAACA 

151 GGGGAGATTT CGCCGTCCAA CCTGGTATCG GTCGGCGCGC AGGCATCGGG 

55 2 01 GCAGATTAAG AAACTTTATG TCAAACTCGG GCAACAGGTT AAAAAGGGCG 

2 51 ATTTGATTGC GGAAAT CAAT TCGACCTCGC AGACCAATAC GCTCAATACG 

301 GAAAAATCCA AATTGGAAAC GTATCAGGCG AAGCTGGTGT CGGCACAGAT 

351 TGCATTGGGC AGCGCGGAGA AGAAATATAA GCGTCAGGCG GCGTTGTGGA 

4 01 AGGATGATGC GACCGCTAAA GAAGATTTGG AAAGCGCACA GGATGCGCTT 

60 451 GCCGCCGCCA AAGCCAATGT TGCCGAGCTG AAGGCTCTAA TCAGACAGAG 
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501 CAAAATTTCC ATCAATACCG CCGAGTCGGA ATTGGGCTAC ACGCGCATTA 

551 CCGCAACGAT GGACGGCACG GTGGTGGCGA TTCTCGTGGA AGAGGGGCAG 

601 ACTGTGAACG CGGCGCAGTC TACGCCGACG ATTGTCCAAT TGGCGAATCT 

651 GGATATGATG TTGAACAAAA TGCAGATTGC CGAGGGCGAT ATTACCAAGG 

7 01 TGAAGGCGGG GCAGGATATT TCGTTTACGA TTTTGTCCGA ACCGGATACG 

751 CCGATTAAGG CGAAGCTCGA CAGCGTCGAC CCCGGGCTGA CCACGATGTC 

801 GTCGGGCGGC TACAACAGCA GTACGGATAC GGCTTCCAAT GCGGTCTACT 

851 ATTATGCCCG TTCGTTTGTG CCGAATCCGG ACGGCAAACT CGCCACGGGG 

901 AT GACGACGC AGAATACGGT TGAAATCGAC GGTGTGAAAA ATGTGCTGAT 

951 TATTCCGTCG CTGACCGTGA AAAATCGCGG CGGCAGGGCG TTTGTGCGCG 

1001 TGTTGGGTGC AGACGGCAAG GCGGCGGAAC GCGAAATCCG GACCGGTATG 

1051 AGAGACAGTA TGAATACCGA AGTAAAAAGC GGGTTGAAAG AGGGGGACAA 

1101 AGTGGTCATC TCCGAAATAA CCGCCGCCGA GCAGCAGGAA AGCGGCGAAC 

1151 GCGCCCTAGG CGGCCCGCCG CGCCGATAA 

This encodes a protein having amino acid sequence <SEQ ID 770>: 



1 MAKMMKWAAV AAVAAA AVWG GWSYLKPEPQ AAYITETVRR GDISRTVSAT 

51 GEISPSNLVS VGAQASGQIK KLYVKLGQQV KKGDLIAEIN STSQTNTLNT 

101 EKSKLETYQA KLVSAQIALG SAEKKYKRQA ALWKD DAT AK EDLESAQDAL 

151 AAAKANVAEL KALIRQSKIS INTAESELGY TRITATMDGT WAILVEEGQ 

201 TVNAAQSTPT IVQLANLDMM LNKMQIAEGD ITKVKAGQDI SFTILSEPDT 

251 PIKAKLDSVD PGLTTMSSGG YNSSTDTASN AVYYYARSFV PNPDGKLATG 

301 MTTQNTVEID GVKNVLIIPS LTVKNRGGRA FVRVLGADGK AAEREIRTGM 

351 RDSMNTEVKS GLKEGDKVVI SEITAAEQQE SGERALGGPP RR* 

ORF85a and ORF85-1 show 98.2% identity in 334 aa overlap: 



PQAAYITETVRRGDISRTVSATGEISPSNLVSVGAQASGQIKKLYVKLGQQVKKGDLIAE 
I I I I I I I ! I I I I I I I I I I I I I II ] I I 1 I I 
VSVGAQASGQIKI LYVKLGQQVKKGDLIAE 
10 20 30 

90 100 110 120 130 140 

INSTSQTNTLNTEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKDDATAKEDLESAQD 
t M I I I I I I I I I I 1 I ! I I I I I ! I I I I I I 1 I I I I I I I I I I I I I I I I I :: I | : | | | | | ! I I I 
INSTSQTNTLNTEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKENATSKEDLESAQD 
40 50 60 70 80 90 

150 160 170 180 190 200 

ALAAAKANVAELKALIRQSKISINTAESELGYTRITATMDGTWAILVEEGQTVNAAQST 
I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I | M I i || | 
AFAAAKANVAELKALIRQSKISINTAESELGYTRITATMDGTWAILVEEGQTVNAAQST 
100 110 120 130 140 150 

210 220 230 240 250 260 

PTIVQLANLDbMLNKMQIAEGDITKVKAGQDI SFTILSEPDT PIKAKLDSVDPGLTTMSS 
M f I I I I I I I I I II I I I I I I I I I I I I I I I I | | I | ] | J | | | || !| | | | | | | | | | | | | | I | | 
PTIVQLANLDMMLNKMQIAEGD ITKVKAGQDI SET I LSEPDTPIKAKLDSVDPGLTTMSS 
160 170 180 190 200 210 

270 280 290 300 310 320 

GGYN S ST DT ASNAVYYYARS FVPN PDGKLATGMTTQNTVE I DGVKNVL HPS LTVKNRGG 

I I I I I I I I I I I I I I I I I I I I I M I I 1 II I II I I I I I I 1 I I I I I I I I I I I I Ill 

GGYNSSTDTASNAVYYYARSFVPNPDGKLATGMTTQNTVEIDGVKNVLIIPSLTVKNRGG 
220 230 240 250 260 270 

330 340 350 360 370 380 

RAFVRVLGADGKAAEREIRTGMRDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGG 

: I I I I I I I I I I I M I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I II II I I | | || 

KAFVRVLGADGPxAAEREIRTGMRDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGG 
280 290 300 310 320 330 



Figure 19D shows plots of hydrophilicity, antigenic index, and AMPHI regions for ORF85a.. 
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Homology with a predicted QRF from N .gonorrhoeae 

ORF85 shows a high degree of identity with a predicted ORF (ORF85ng) from N. gonorrhoeae: 



ORF85 1 MAKMMKWAAVAAVAAAAVWGGWS . LKPEPHVLDITETVRRG 40 

I ! I I I I I I ! I I I I I I I M [ I I i I I II M : : 111:1111 

ORF85ng 1 MAKMMKWAAVAAVAAAAWGGWSYLKPEPQAAYITEAVRRGDISRTVSAT 50 

ORF85 ISFTILSEPDT 250 

I I I I I I I I I I I 

ORF85ng 201 TVNAAQSTPTIVQLANLDMMLNKMQIAEGDITKVICAGQDISFTILSEPDT 250 

ORF85 251 P IKAKLDS VDPGLTTMS SGGYNS STDTASNAVY YYARS FVPNPDGKLATG 300 

II I I I I II I I I I II I I I I i M I 1 I I M II II I I I I I I I ( I ! II I II I I I I 
ORF85ng 251 PIKAKLDSVDPGLTTMS SGGYNS ST DTASNAVYYYARSFVPNPDGKLATG 300 

ORF85 301 MTTQNTVEIDGVKNVLIIPSLTVKNRGGKAFVRVLGADGKAAEREIRTGM 350 

I I I I I I I I I II II I II : I I I I I I I II II I I II II I I I I I I I MINIM 
ORF85ng 301 MTTQNTVEIDGVKNVLLIPSLTVKNRGGKAFVRVLGADGKAVEREIRTGM 350 

ORF85 152 RDSMNTEVKSGLKEGDKVVISEITAAEQQESGERALGGPPRR 393 

: I I I I I I I I I I II I I I I I M I II I II I II M II M I I I I I I I 
ORF85ng 351 KDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGGPPRR 3 93 



The complete length ORF85ng nucleotide sequence <SEQ ID 771> is: 



1 ATGGCAAAAA TGATGAAATG GGCGGCTGTT GCGGCGGTCG CGGCGGCaac 

51 GGTTTGGGGC GGATGGTCTT ATCTGAAGCC CGAACCGCAG GCTGCTTATA 

101 TTACGGAaac ggTCAGGCGC GGCGATATCA GCCGGACGGT TTCCGCGACG 

151 GgcgAGATTT CGCCGTCCAA CCTGGTATCG GTCGGCGCGC AGGCTT CGGG 

201 GCAGATTAAA AAGCTTTATG TCAAACTCGG GCAACAGGTC AAAAAGGGCG 

251 ATTTGATTGC GGAAATCAAT TCGACCACGC AGACCAACAC GATCGATATG 

301 GAAAAATCCA AATTGGAAAC GTAT CAGGCG AAGCTGGTGT CGGCACAGAT 

351 TGCATTGGGC AGCGCGGAGA AGAAATATAA GCGT CAGGCG GCGTTGTGGA 

4 01 AGGATGATGC GACCTCTAAA GAAGATTTGG AAAGCGCGCA GGATGCGCTT 

4 51 GCCGCCGCCA AAGCCAATGT TGCCGAGTTG AAGGCTTTAA TCAGACAGAG 

501 CAAAATTTCC ATCAATACCG CCGAGTCGGA TTTGGGCTAC ACGCGCATTA 

551 CCGCGACGAT GGACGGCACG GTGGTGGCGA TTCCCGTGGA AGAGGGGCAG 

601 ACTGTGAACG CGGCGCAGTC TACGCCGACG ATTGTCCAAT TGGCGAATCT 

651 GGATATGATG TTGAACAAAA TGCAGATTGC CGAGGGCGAT ATTACCAAGG 

7 01 TGAAGGCGGG GCAGGATATT TCGTTTACGA TTTTGTCCGA ACCGGATACG 
751 CCGATTAAGG CGAAGCTCGA CAGCGTCGAC CCCGGGCTGA CCACGATGTC 

8 01 GTCGGGCGGC TACAAC AG CA GTACGGATAC GGCTTCCAAT GCGGTCTATT 
851 ATTATGCCCG TTCGTTTGTG CCGAATCCGG ACGGCAAACT CGCCACGGGG 
901 ATGACGACGC AGAATACGGT TGAAATCGAC GGTGTGAAAA ATGTGTTGCT 
951 TATTCCGTCG CTGACCGTGA AAAATCGCGG CGGCAAGGCG TTCGTACGCG 

1001 TGTTGGGTGC GGACGGCAAG GCAGTGGAAC GCGAAATCCG GACCGGTATG 

1051 AAAGACAGTA TGAATACCGA AGTGAAAAGC GGGTTGAAAG AGGGGGACAA 

1101 AGTGGTCATC TCCGAAATAA CCGCCGCCGA GCAGCAGGAA AGCGGCGAAC 

1151 GCGCCCTAGG CGGCCCGCCG CGCCGATAA 

This encodes a protein having amino acid sequence <SEQ ID 772>: 

1 MAKMMKWAAV AAVAAA AVWG GWSYLKPEPQ AAYITEAVR R GD ISRTVSAT 

51 GEISPSNLVS VGAQASGQIK KLYVKLGQQV KKGDLIAEIN STTQTNTIDM 

101 EKSKLETYQA KLVSAQIALG SAEKKYKRQA ALWKDDATSK EDLE SAQDAL 

151 AAAKANVAEL KALIRQSKIS INTAESDLGY TRITATMDGT WAIPVEEGQ 

201 TVNAAQSTPT IVQLANLDMM LNKMQIAEGD ITKVKAGQDI SFTILSEPDT 

251 PIKAKLDSVD PGLTTMSSGG YNSSTDTASN AVYYYARSFV PNPDGKLATG 

301 MTTQNTVEID GVKNVLLIPS LTVKNRGGKA FVRVLGADGK AVEREIRTGM 

351 KDSMNTEVKS GLKEGDKWI SEITAAEQQE SGERALGGPP RR* 

ORF85ng and ORF85-1 show 96.1% identity in 334 aa overlap: 

30 40 50 60 70 80 

orf85ng PQAAYITETVRRGDISRTVSATGEISPSNLVSVGAQASGQIKKLYVKLGQQVKKGDLIAE 
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10 



20 



30 



orf85ng 
orf85-l 



orf 85ng 
orf 85-1 



orf85ng 
orf 85-1 



orf 85ng 
orf85-l 



orf85ng 
orf85-l 



90 100 110 120 130 140 

INSTTQTNTIDMEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKDDATSKEDLESAQD 
I I II : I I I I : : M I I I I I I I I I I I I I I I I I M M I I I I I I I I II I :: f M I I I M II I I 
INSTSQTNTLNTEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKENATSKEDLESAQD 



40 



50 



60 



70 



90 



150 160 170 180 190 200 

ALAAAKANVAELKALIRQSKI S INTAESDLGYTR I TATMDGTWAI PVEEGQTVNAAQST 
I : II I I I I I I I I I I II I I I M I 1 M I M : I I I M II I I I 1 I I I I I I I I I M I I I I I I I I 
AFAAAKANVAELKALIRQSKISINTAESELGYTRITATMDGTWAILVEEGQTVNAAQST 
100 110 120 130 140 150 

210 220 230 240 250 260 

PTIVQLANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSS 
I II II II I I I II I I I I I II II M I I I I I I I I I II II I II II I I I I I I I I M I I II I I I I I 
PTIVQLANLDMMLNKMQIAEGDITKVKAGQDISFTILSEPDTPIKAKLDSVDPGLTTMSS 
160 170 180 190 200 210 

270 280 290 300 310 320 

GGYNSSTDTASNAVYYYARSFVPNPDGKLATGMTTQNTVEIDGVKNVLLIPSLTVKNRGG 

I I I I I I I II I I I I I I I I I I I I I M I II I I I I I I I I I I II I I II I I I I I : I I I I I I I I II I 
GGYNSSTDTASNAVYYYARSFVPNPDGKLATGMTTQNTVEIDGVKNVLIIPSLTVKNRGG 

220 230 240 250 260 270 

330 340 350 360 370 380 

KAFVRVLGADGKAVEREIRTGMKDSMNTEVKSGLKEGDKWISEITAAEQQESGERALGG 

II I I I I I I I I I I I : I I I I I I I I : I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I || | I M 
KAFVRVLGADGKAAEREIRTGMRDSMNTEVKSGLKEGDKWrSEITAAEQQESGERALGG 

280 290 300 310 320 330 



orf 85ng 
orf85-l 



PPRRX 
I II M 
PPRRX 



In addition, ORF85ng shows significant homology to an E.coli membrane fusion protein: 

gi 1 1787104 (AE000189) o380; 27% identical (27 gaps) to 332 residues from 
membrane fusion protein precursor, MTRC_NEIGO SW: P43505 (412 aa) [Escherichia 
coli] Length = 380 
Score = 193 bits (485), Expect = 2e-48 

Identities = 120/345 (34%), Positives = 182/345 (51%), Gaps = 13/345 (3%) 

PQAAYITETVRRGDISRTVSATGEISPSNLVSVGAQASGQIKKLYVKLGQQVKKGDLIAE 88 
P Y T VR GD+ ++V ATG++ V VGAQ SGQ+K L V +G +VKK L+ 

PVPTYQTLIVRPGDLQQSVLATGKLDALRKVDVGAQVSGQLKTLSVAIGDKVKKDQLLGV 100 

INSTTQTNTIDMEKSKLETYQAKLVSAQIALGSAEKKYKRQAALWKDDATSKEXXXXXXX 148 
1+ N I ++ L +A+ A+ L A Y RQ L + A S++ 

IDPEQAENQIKEVEATLMELRAQRQQAEAELKLARVTYSRQQRLAQTKAVSQQDLDTAAT 160 

XXXXXXXXXXXXXXXIRQSKISINTAESDLGYTRITATMDGTVVAI PVEEGQTVNAAQST 208 
I++++ S++TA+++L YTRI A M G V I +GQTV AAQ 



ML K Q++E D+ +K GQ FT+L +P T 



+ + ++A++YYAR VPNP+G L MT Q +++ VKNVL IP + + G 
■-TPEKVNDAIFYYARFEVPNPNGLLRLDMTAQVHIQLTDVKNVLTIPLSALGDPVG 32 8 



+V L +G+ ERE+ G ++ + E+ GL+ GD+WI E 
DNRYKVKLLRNGETREREVTIGARNDTDVEIVKGLEAGDEWIGE 373 



45 


Query: 


29 




Sb j ct : 


41 




Query: 


89 


50 


Sbjct: 


101 




Query: 


149 


55 


Sbjct: 


161 






209 




Sbjct: 


221 


60 


Query: 


269 




Sbjct: 


274 


65 




329 




Sbjct: 


329 
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Based on this analysis, it was predicted that the proteins from N. meningitidis and JV. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF85-1 (40.4kDa) was cloned in the pGex vectors and expressed mE.coli, as described above. 
The products of protein expression and purification were analyzed by SDS-PAGE. Figure 19A 
shows the results of affinity purification of the GST-fusion protein. Purified GST-fusion protein 
was used to immunise mice, whose sera were used for Western blot (Figure 19B), FACS analysis 
(Figure 19C), and ELISA (positive result). These experiments confirm that ORF85-1 is a 
surface-exposed protein, and that it is a useful immunogen. 

Example 92 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 773>: 



1 . .ATTCCCGCCA CGATGACATT TGAACGCAGC GGCAATGCTT ACAAAATCGT 

51 TTCGACGATT AAAGTGCCGC TATACAATAT CCGTTTCGAG TCCGGCGGTA 

101 CGGTTGTCGG CAATACCCTG CACCCTACCT ACTATAGAGA CATACGCAGG 

151 GGCAAACTGT ATGCGGAAgc CAAATTCGCC GACgGcAGCG TAACTTACGG 

201 CAAAGCGGGC GAGAGCAAAA CCGAGCAAAG CCCCAAGGCT ATGGATTTGT 

251 TCACGCTTGC CTGGCAGTTG GCGGCAAATG ACGCGAAACT CCCCCCGGGG 

3 01 CTGAAAATCA CCAACGGCAA AAAACTTTAT TCCGTCGGCG GTTTGAATAA 
351 GGCGGGTACA GGAAAATACA GCATAGGCGG CGTGGAAACC GAAGTCGTCA 

4 01 AATATCGGGT GCGGCGCGGC GACGATGCGG TAATGTATTT cTTCGCACCG 
4 51 TCCCTGAACA ATATTCCGGC ACAAATCGGC TATACCGACG ACGGCAAAAC 
501 CTATACGCTG AAACT CAAAT CGGTGCAGAT CAACGGCCAG GCAGCCAAAC 
551 CGTAA 



This corresponds to the amino acid sequence <SEQ ID 774; ORF120>: 



Further work revealed the complete nucleotide sequence <SEQ ED 775>: 



1 ATGATGAAGA CTTTTAAAAA TATATTTTCC GCCGCCATTT TGTCCGCCGC 

51 CCTGCCGTGC GCGTATGCGG CAGGGCTGCC CCAATCCGCC GTGCTGCACT 

101 ATTCCGGCAG CTACGGCATT CCCGCCACGA TGACATTTGA ACGCAGCGGC 

151 AATGCTTACA AAATCGTTTC GACGATTAAA GTGCCGCTAT ACAATATCCG 

2 01 TTTCGAGTCC GGCGGTACGG TTGTCGGCAA TACCCTGCAC CCTACCTACT 

251 ATAGAGACAT ACGCAGGGGC AAACTGTATG CGGAAGCCAA ATTCGCCGAC 

301 GGCAGCGTAA CTTACGGCAA AGCGGGCGAG AGCAAAACCG AGCAAAGCCC 

351 CAAGGCTATG GATTTGTTCA CGCTTGCCTG GCAGTTGGCG GCAAATGACG 

4 01 CGAAACTCCC CCCGGGGCTG AAAATCACCA ACGGCAAAAA ACTTTATTCC 

451 GTCGGCGGTT TGAATAAGGC GGGTACAGGA AAATACAGCA TAGGCGGCGT 

501 GGAAACCGAA GTCGTCAAAT ATCGGGTGCG GCGCGGCGAC GATGCGGTAA 

551 TGTATTTCTT CGCACCGTCC CTGAACAATA TTCCGGCACA AATCGGCTAT 

601 ACCGACGACG GCAAAACCTA TACGCTGAAA CTCAAATCGG TGCAGATCAA 

651 CGGCCAGGCA GCCAAACCGT AA 



This corresponds to the amino acid sequence <SEQ ID 776; ORF120-1>: 



1 MMKTFKNIFS AAILSAALPC AYA AGLPQSA VLHYSGSYGI PATMTFERSG 

51 NAYKIVSTIK VPLYNIRFES GGTWGNTLH PTYYRDIRRG KLYAEAKFAD 

101 GSVTYGKAGE SKTEQSPKAM DLFTLAWQLA ANDAKLPPGL KITNGKKLYS 

151 VGGLNKAGTG KYSIGGVETE WKYRVRRGD DAVMYFFAPS LNNIPAQIGY 

201 TDDGKTYTLK LKSVQINGQA AKP* 



1 



. TPAIMTFERS GNAYKIVSTI KVPLYNIRFE SGGTWGNTL HPTYYRDIRR 

GKLYAEAKFA DGSVTYGKAG ESKTEQSPKA MDLFTLAWQL AANDAKLPPG 

LKITNGKKLY SVGGLNKAGT GKYSIGGVET EWKYRVRRG DDAVMYFFAP 

SLNNIPAQIG YTDDGKTYTL KLKSVQINGQ AAKP* 



51 
101 
151 
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Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted QRF from N. meningitidis (strain A) 

ORF120 shows 92.4% identity over a 184aa overlap with an ORF (ORF120a) from strain A of iV. 
meningitidis: 

10 20 30 

nrfl , n IPATMTFERSGNAYKIVSTIKVPLYNIRFE 
ortl2U.pep _ ^ | | | | | | | | ] | | | 1 I I I 

orfl20a SAAILSAALPCAYAAGLPXSAVLHYSGSYGIPATXXXXXXXNAXKIVSTIKVPLYNIRFE 
10 20 30 40 50 60 

40 50 60 70 80 90 

orf 12 0 ceo SGGTVVGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAMDLFTLAWQL 

I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I = Nil 

orfl20a SG GTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAXXXXXXQSPKAMDLFTLAWQL 
70 80 90 100 110 120 

100 110 120 130 140 150 

orfl20 pep AANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGDDAVMYFFAP 

| | | 1 | 1 | | I I M I I I I I I ! I I I I I I M I 1 I I I I I I I I I I I I I I I M M M I I I I 

orfl20a AANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGDDAVMYFFAP 
130 140 150 160 170 180 

160 170 180 

orf 120 .pep SLNN I PAQ IGYTDDGKTYTLKLKSVQINGQAAKPX 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl20a S LNN I PAQ I GYTDDGKT YT LKLKS VQINGQAAKPX 

190 200 210 220 

The complete length ORF 120a nucleotide sequence <SEQ ID 777> is: 

1 ATGAT GAAGA CTTTTAAAAA TATATTTTCC GCCGCCATTT TGTCCGCCGC 

51 CCTGCCGTGC GCGTATGCGG CAGGGCTGCC CNAATCCGCC GTGCTGCACT 

101 ATTCCGGCAG CT AC GG CAT T CCCGCCACNA NNANNTNNGN ACNNNGNGNC 

151 AATGCTTNCA AAATCGTTTC GACGATTAAA GTGCCGCTAT ACAAT AT CCG 

201 TTTCGAGTCC GGCGGTACGG TTGTCGGCAA TACCCTGCAC CCTACCTACT 

251 ATAGAGACAT ACGCAGGGGC AAACTGTATG CGGAAGCCAA ATTCGCCGAC 

3 01 GGCAGCGTAA CCTACGGCAA AGCGGNNNNN ANCNNNNNNG NGCAAAGCCC 
351 CAAGGCTATG GATTTGTTCA CGCTTGCNTG GCAGTTGGCG GCAAATGACG 
401 CGAAACTCCC CCCGGGGCTG AAAATCACCA ACGGCAAAAA ACTTTATTCC 

4 51 GTCGGCGGTT TGAATAAGGC GGGTACAGGA AAATACAGCA TAGGCGGCGT 
501 GGAAACCGAA GTCGTCAAAT ATCGGGTGCG GCGCGGCGAC GATGCGGTAA 
551 TGTATTTCTT CGCACCGTCC CTGAACAATA TTCCGGCACA AATCGGCTAT 
601 ACCGACGACG GCAAAACCTA TACGCTGAAA CTCAAATCGG T G C AG AT C AA 
651 CGGCCAGGCA GCCAAACCGT AA 

This encodes a protein having amino acid sequence <SEQ ID 778>: 

1 MMKTFKNIFS AAILSAALPC AYA AGLPXSA VLHYSGSYGI PATXXXXXXX 

51 NAXKIVSTIK VPLYNIRFES GGTVVGNTLH PTYYRDIRRG KLYAEAKFAD 

101 GSVTYGKAXX XXXXQSPKAM DLFTLAWQLA ANDAKLPPGL KITNGKKLYS 

151 VGGLNKAGTG KYSIGGVETE VVKYRVRRGD DAVMYFFAPS LNN I PAQ I GY 

201 TDDGKTYTLK LKSVQINGQA AKP* 

ORF 120a and ORF 120-1 show 93.3% identity in 223 aa overlap: 

10 20 30 40 50 60 

orf 120a . pep MMKTFKNIFSAAILSAALPCAYAAGLPXSAVLHYSGSYGIPATXXXXXXXNAXKIVSTIK 
I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I 1 I ! I I I I I I I I I I I : II I I I I I I I 

orf 120-1 MMKTFKNIFSAAILSAALPCAYAAGLPQSAVLHYSGSYGIPATMTFERSGNAYKIVSTIK 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 120a . pep VPLYNIRFESGGTVVGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAXXXXXXQSPKAM 
I I I I I I I I I I I I ] I I I I M I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I : I I I I I I 
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rtrf n VPLY NIRFESGGTWGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAM 
° rl±ZU 1 70 80 90 100 110 120 

130 140 150 160 170 180 

orfl20a pep DLFTLAWQLAANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGD 

P P MIMMIMIIIIIIIMIMIMIIIMIIIIIII 1 UN 

rvrfl ?0-1 DLFTLAWQLAANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGD 
° 130 140 150 160 170 180 

190 200 210 220 

orfl20a pep DAVMYFFAPSLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKPX 

M I I I I I I I M I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I 

orf 120-1 DAVMYFFAPSLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKPX 
190 200 210 220 

Homology with a predicted ORF from N. gonorrhoeae 

ORF120 shows 97.8% identity over 184 aa overlap with a predicted ORF (ORF120ng) from 
N. gonorrhoeae: 

orf 12 0 pep IPATMT FERSGNAYKIVSTIKVPLYNIRFE 30 

I II M I I I I I I I I I I I I I I 1 I I I I I I I I I I 
orfl2 0ng SAAILSAALPCAYAARLPQSAVLHYSGSYGIPATMT FERSGNAYKIVSTIKVPLYNIRFE 69 

orfl20 pep SGGTVVGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAMDLFTLAWQL 90 

| | | | | | | | | | I I : I I : I II I I I I II I I I 1 I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl20ng SGGTWGNTLHPAYYKDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAMDLFTLAWQL 12 9 

orf 12 0 pep AANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGDDAVMYFFAP 150 

I I I I I I I I I I I I I I I II I 1111111111111:1 I I I I I 

orfl20ng AANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGDDTVTYFFAP 18 9 

orf 120. pep SLNN I PAQIGYT DDGKTYT LKLKSVQINGQAAKP 184 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I 
orfl2 0ng SLNNI PAQIGYT DDGKTYTLKLKSVQINGQAAKP 223 

The complete length ORF120ng nucleotide sequence <SEQ ID 779> is: 

1 AT GAT G AAG A CTTTTAAAAA TATATTTTCC GCCGCCATTT TGTCCGCCGC 

51 CCTGCCGTGC GCGTATGCGG CAAGGCTACC CCAATCCGCC GTGCTGCACT 

101 ATTCCGGCAG CTACGGCATT CCCGCCACGA TGACATTTGA ACGCAGCGGC 

151 AATGCTTACA AAATCGTTTC GACGATTAAA GTGCCGCTAT ACAATATCCG 

201 TTTCGAATCC GGCGGTACGG TTGTCGGCAA TACCCTGCAC CCTGCCTACT 

251 ATAAAGACAT ACGCAGGGGC AAACTGTATG CGGAAGCCAA ATTCGCCGAC 

3 01 GGCAGCGTAA CCTACGGCAA AGCGGGCGAG AGCAAAACCG AGCAAAGCCC 
351 CAAGGCT AT G GATTTGTTCA CGCTTGCCTG GCAGTTGGCG GCAAATGACG 

4 01 CGAAACTCCC CCCGGGTCTG AAAATCACCA ACGGCAAAAA ACTTTATTCC 
4 51 GTCGGCGGCC TGAATAAGGC GGGTACGGGA AAATACAGCA TaggCGGCGT 
501 GGAAACCGAA GTCGTCAAAT ATCGGGTGCG GCGCGGCGAC GATACGGTAA 
551 CGTATTTCTT CGCACCGTCC CTGAACAATA TTCCGGCACA AATCGGCTAT 
601 AC CGACGACG GCAAAACCTA TACGCTGAAG CTCAAAT CGG TGCAGATCAA 
651 CGGACAGGCC GCCAAACCGT AA 

This encodes a protein having amino acid sequence <SEQ ID 780>: 

1 MMKTFKNIFS AAILSAALPC AYAA RLPQSA VLHYSGSYGI PATMTFERSG 

51 NAYKIVSTIK VPLYNIRFES GGTWGNTLH PAYYKDIRRG KLYAEAKFAD 

101 GSVTYGKAGE SKTEQSPKAM DLFTLAWQLA ANDAKLPPGL KITNGKKLYS 

151 VGGLNKAGTG KYSIGGVETE WKYRVRRGD DTVTYFFAPS LNNIPAQIGY 

201 TDDGKTYTLK LKSVQINGQA AKP* 

In comparison with ORF120-1, ORF120ng shows 97.8% identity in 223 aa overlap: 

10 20 30 40 50 60 

orf 120-1. pep MMKTFKNIFSAAILSAALPCAYAAGLPQSAVLHYSGSYGIPATMTFERSGNAYKIVSTIK 

orfl20ng MMKTFKNIFSAAILSAALPCAYAARLPQSAVLHYSGSYGIPATMTFERSGNAYKIVSTIK 
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70 80 90 100 110 120 

nrf 1 ?0-1 oeD VPLYNIRFESGGTVVGNTLHPTYYRDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAM 
l.p p !|||||||im|||||1|| | :M: || im ||||||| |MMM MMIIIIMMIII 
nrfl?nn a vplyNIRFESGGTWGNTLHPAYYKDIRRGKLYAEAKFADGSVTYGKAGESKTEQSPKAM 
° y 70 80 90 100 110 120 

130 140 150 160 170 180 

orf 120-1 pep DLFTLAWQLAANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEVVKYRVRRGD 

I I I I M I I 1 M I I I I I I I I I I I M II I I I I I ! I I I I I I Ill 

orfl20na dLFTLAWQLAANDAKLPPGLKITNGKKLYSVGGLNKAGTGKYSIGGVETEWKYRVRRGD 
y 130 140 150 160 170 180 

190 200 210 220 

orf 12 0-1 pep DAVMYFFAPSLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKPX 

|:| I I I I I I I M I I I I I I I I I 

orfl20ng DTVTYFFAPSLNNIPAQIGYTDDGKTYTLKLKSVQINGQAAKPX 

190 200 210 220 

This analysis, including the presence of a putative leader sequence in the gonococcal protein 
suggests that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 93 

The following partial DNA sequence was identified in N. meningitidis <SEQ ED 78 1>: 

1 ATGTATCGGA GGAAAGGGCG GGGCATCAAG CCGTGGATGG GTGCCGGTGC 

51 .GCGTTTGCC GCCTTGGTCT GGCTGGTTTT CGCGCTCGGC GAT ACT T T G A 

101 CTCCGTTTGC GGTTGCGGCG GTGCTGGCGT ATGTATTGGA CCCTTTGGTC 

151 GAATGGTTGC AGAAAAAGGG TTTGAACCGT GCATCCGCTT CGATGTCTGT 

201 GATGGTGTTT TCCTTGATTT TGTTGTTGGC ATTATTGTTG ATTATCGTCC 

251 CTATGCTGGT CGGGCAGTTC AACAATTTGG CATCGCGCCT GCCCCAATTA 

301 ATCGGTTTTA TGCAGAACAC GCTGCTGCCG TGGTTGAAAA AT ACAAT CGG 

351 CGGATATGTG GAAATCGATC AGGCATCTAT TATTGCGTGG CTTCAGGCGC 

4 01 ATACGGGAGA GTTGAGCAAC GCGCTTAAGG CGTGGTTTCC CGTTTTGATG 

451 AGGCAGGGCG GCAATATT . . 

This corresponds to the amino acid sequence <SEQ ID 782; ORF121>: 

1 MYRRKGRGIK PWMGAGXAFA ALVWLVFALG DTLTPFAVAA VLAYVLD PLV 
51 EWLQKKGLNR ASASMSVMVF SLILLLALLL IIVPMLVGQF NNLASRLPQL 
101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW LQAHTGELSN ALKAWFPVLM 
151 RQGGNI . . 

Further work revealed the complete nucleotide sequence <SEQ ID 783>: 

1 ATGTATCGGA GGAAAGGGCG GGGCATCAAG CCGTGGATGG GTGCCGGTGC 

51 GGCGTTTGCC GCCTTGGTCT GGCTGGTTTT CGCGCTCGGC GATACTTTGA 

101 CTCCGTTTGC GGTTGCGGCG GTGCTGGCGT ATGTATTGGA CCCTTTGGTC 

151 GAATGGTTGC AGAAAAAGGG TTTGAACCGT GCATCCGCTT CGATGTCTGT 

201 GATGGTGTTT TCCTTGATTT TGTTGTTGGC ATTATTGTTG ATTATCGTCC 

251 CTATGCTGGT CGGGCAGTTC AACAATTTGG CATCGCGCCT GCCCCAATTA 

301 ATCGGTTTTA TGCAGAACAC GCTGCTGCCG TGGTTGAAAA ATACAATCGG 

351 CGGATATGTG GAAATCGATC AGGCATCTAT TATTGCGTGG CTTCAGGCGC 

401 ATACGGGAGA GTTGAGCAAC GCGCTTAAGG CGTGGTTTCC CGTTTTGATG 

451 AGGCAGGGCG GCAATATTGT CAGCAGTATC GGCAACCTGC TGCTGCTTCC 

501 CTTGCTGCTT TACTATTTCC TGCTGGATTG GCAGCGGTGG TCGTGCGGCA 

551 TTGCCAAACT GGTTCCGAgG CGTTTTGCCG GTGCTTATAC GCGCATTACA 

601 GGCAATTTGA ACGAGGTATT GGGCGAATTT TTGCGCGGGC AGCTTCTGGT 

651 AATGCTGATT ATGGGCTTGG TTTACGGTTT GGGATTGGTG CTGGTCGGGC 

7 01 TGGATTCGGG GTTTGCCATC GGTATGCTTG CCGGTATTTT GGTGTTTGTC 

7 51 CCTTATCTCG GGGCGTTTAC GGGATTGCTG CTTGCCACCG TCGCCGCCTT 

8 01 GCTCCAGTTC GGTTCGTGGA ACGGCATCCT ATCGGTTTGG GCGGTTTTTG 
851 CCGTAGGACA GTTTCTCGAA AGTTTTTTCA TTACGCCGAA AATCGTGGGA 
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901 GACCGTATCG GGCTGTCGCC GTTTTGGGTT ATCTTTTCGC TGATGGCGTT 

951 CGGGCAGCTG ATGGGCTTTG TCGGAATGTT GGCGGGATTG CCTTTGGCCG 

1001 CCGTAACCTT GGTCTTGCTT CGCGAGGGCG T GCAGAAAT A TTTTGCCGGC 

1051 AGTTTTTACC GGGGCAGGTA G 

This corresponds to the amino acid sequence <SEQ ID 784; ORF121-l>: 

1 M YRRKGRGIK PWMGAGAAFA ALVWLVFALG DTL TPFAVAA VLAYVLDPLV 
51 F.WT.OKKGLNR ASASMS VMVF SLILLLALLL IIV PMLVGQF NNLASRLPQL 
101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW LQAHTGELSN ALKAWFPVLM 
151 ROGGNIVS SI GNLLLLPLLL YYFLL DWQRW SCGIAKLVPR RFAGAYTRIT 
201 GNLNEVLGEF LRGQL LVMLI MGLVYGLGLV LV GLDSGFAI GMLAG ILVFV 
251 PYLGAFT GLL LA TVAALLQF GSWNG ILSVW AVFAVGQFLE SF FITPKIVG 
301 DRIGLSPFWV IFSLMAFGQL MGF VGMLAGL PLAAVTLVLL REGVQKYFAG 
351 SFYRGR* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF121 shows 98.7% identity over a 156aa overlap with an ORF (ORF121a) from strain A of JV. 
meningitidis: 

10 20 30 40 50 60 

orf 12 1 pep MYRRKGRGIKPWMGAGXAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 
I I I I I I I I I I M I M I I I I I I I I I I I I ! I ! I I I I I I I I I I I I M I I I I I I I M I I I I I 
orf 121a MYRRKGRGIKPWMDAGAAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 121 pep ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 

Ml I I I I I M I I I I I II I I I I I II I I I I 1 I 1 I I I M I I I 

orf 12 la ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 

70 80 90 100 110 120 

130 140 150 

orf 121 .pep E I D QAS I I AW LQ AH T GE L SN ALKAW F PVLMRQGGN I 
I I [ I I II I I I I I I I I M I I I I I II I I I II II I I I I I 
orf 121a EIDQASIIAWLQAHTGELSNALKAWFPVLMRQGGNIVSSIGNLLLLPLLLYYFLLDWQRW 

130 140 150 160 170 180 

orf 121a SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLVLVGLDSGFAI 
190 200 210 220 230 240 

The complete length ORF121a nucleotide sequence <SEQ ID 78 5> is: 

1 ATGTATCGGA GGAAAGGGCG GGG CAT C AAG CCGTGGATGG ATGCCGGTGC 

51 GGCGTTTGCC GCCTTGGTCT GGCTGGTTTT CGCGCTCGGC GATACTTTGA 

101 CTCCGTTTGC GGTTGCGGCG GTGCTGGCGT ATGTATTGGA CCCTTTGGTC 

151 GAATGGTTGC AGAAAAAGGG TTTGAACCGT GCATCCGCTT CGATGTCTGT 

201 GATGGTGTTT TCCTTGATTT TGTTGTTGGC ATTATTGTTG ATTATTGTCC 

251 CTATGCTGGT CGGGCAGTTC AACAATTTGG CATCGCGCCT GCCCCAATTA 

301 ATCGGTTTTA TGCAGAACAC GCTGCTGCCG TGGTTGAAAA ATACAATCGG 

351 CGGATATGTG GAAATCGATC AGGCATCTAT TATTGCGTGG CTTCAGGCGC 

401 ATACGGGCGA GTTGAGCAAC GCGCTTAAGG CGTGGTTTCC CGTTTTGATG 

4 51 AGGCAGGGCG GCAATATTGT CAGCAGTATC GGCAACCTGC TGCTGCTTCC 

501 CTTGCTGCTT TACTATTTCC TGCTGGATTG GCAGCGGTGG TCGTGCGGCA 

551 TTGCCAAACT GGTTCCGAGG CGTTTTGCCG GTGCTTATAC GCGCATTACA 

601 GGCAATTTGA ACGAGGTATT GGGCGAATTT TTGCGCGGGC AGCTTCTGGT 

651 GATGCTGATT ATGGGTTTGG TTTACGGCTT GGGGTTGGTG CTGGTCGGGC 

7 01 TGGATTCGGG GTTTGCAATC GGTATGGTTG CCGGTATTTT GGTTTTTGTT 
751 CCCTATTTGG GCGCGTTTAC AGGACTGCTG CTGGCAACCG TCGCCGCCTT 

8 01 GCTCCAGTTC GGTTCGTGGA ACGGCATCTT GGCTGTTTGG GCGGTTTTTG 
851 CCGTAGGACA GTTTCTCGAA AGTTTTTTCA TTACGCCGAA AATCGTGGGA 
901 GACCGTATCG GCCTGTCGCC GTTTTGGGTT ATCTTTTCGC TGATGGCGTT 
951 CGGGCAGCTG ATGGGCTTTG TCGGAATGTT GGCCGGATTG CCTTTGGCCG 

1001 CCGTAACCTT GGTCTTGCTT CGCGAGGGCG TGCAGAAATA TTTTGCCGGC 
1051 AGTTTTTACC GGGGCAGGTA G 
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This encodes a protein having amino acid sequence <SEQ ID 78 6>: 

1 MYRRKGRGIK PWMDAGAAFA ALVWLVFALG DTL TPFAVAA VLAYVLDPLV 

51 EWLQKKGLNR ASASMS VMVF SLILLLALLL IIV PMLVGQF NNLASRLPQL 

101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW LQAHTGELSN ALKAWFPVLM 

5 151 RQGGNIVS SI GMLLLLPLLL YYFLL DWQRW SCGIAKLVPR RFAGAYTRIT 

201 GNLNEVLGEF LRGQL LVMLI MGLVYGLGLV LV GLDSGFAI GMVAG ILVFV 

251 PYLGAFTGLL LA TVAALLQF GSWNG ILAVW AVFAVGQFLE SF FITPKIVG 

301 DRIGLSPFWV IFSLMAFGQL MGF VGMLAGL PLAAVTLVLL REGVQKYFAG 

351 SFYRGR* 

10 ORF121a and ORF121-1 show99.2% identity in 356 aa overlap: 



MYRRKGRGIKPWMDAGAAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 
I I I I I I I I I 1 I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I i M I I I i I I I I I I 
MYRRKGRGIKPWMGAGAAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 



10 



20 



30 



50 



60 



50 



70 80 90 100 110 120 

orf 121a. pep ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 
I I I I I I I I I I I I I I I I I I I t I I I I I I I I II I [ I I I I I I I I I I I 1 I I M I I M I I I I I I M 
orf 121-1 ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 121a . pep EIDQASIIAWLQAHTGELSNALKAWFPVLMRQGGNIVSSIGNLLLLPLLLYYFLLDWQRW 
I I I I I I I I I I I I I I I I I I I I I I i 1 I I 1 I I I I I I I I I I I I I I I I I I I I II I [ I I I I I I | | | 
o r f 1 2 1- 1 EIDQAS I IAWLQAHTGELSNALKAWFPVLMRQGGNI VS S I GNLLLLPLLLYYFLLDWQRW 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 12 la . pep SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLVLVGLDSGFAI 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

orf 12 1-1 SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLVLVGLDSGFAI 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 121a . pep GMVAG1LVFVPYLGAFTGLLLATVAALLQFGSWNGILAVWAVFAVGQFLESFFITPKIVG 
I I : I I I I I I I I I II I I I I II I I I I I I I ( I I I ! I II I I : M I I I I I M I ( I M I I I I I II I 
orf 121-1 GMLAGILVFVPYLGAFTGLLLATVAALLQFGSWNGILSVWAVFAVGQFLESFFITPKIVG 

250 260 270 280 290 300 

310 320 330 340 350 

orf 121a . pep DRIGLSPFWVIFSLMAFGQLMGFVGMLAGLPLAAVTLVLLREGVQKYFAGSFYRGRX 
I I I I I I I I M I I I I I I I I I I I I I I I I II I I I I I 1 I I I I I I I I II I I I I I I I | | | | | | 
orf 121-1 DRIGLSPFWVIFSLMAFGQLMGFVGMLAGLPLAAVTLVLLREGVQKYFAGSFYRGRX 
310 320 330 340 350 

Homology with a predicted ORF from N.gonorrhoeae 

ORF121 shows 97.4% identity over a 156 aa overlap with a predicted ORF (ORF121ng) from 
N.gonorrhoeae: 

orf 121 .pep MYRRKGRGIKPWMGAGXAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 60 
I I I I I I I I II i M I I I I I I I I I I II : I M I I N I I I I I I II I I I I II II I I I M IN I I 

orf 12 Ing MYRRKGRGIKPWMGAGAAFAALWLVYALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 60 

orf 121 .pep ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 120 

I I I I I I I I M I I I I I ! I I I I I M I I I I I I M I I M I M I I I I I I I I I I I I II I I I 

orfl21ng ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 120 

orfl21.pep E I DQAS I IAWLQAHTGELSNALKAWFPVLMRQGGNI 156 

I M M I M I I : I I II I I II I I I I I | | | | | | : | | | | | 
orfl21ng EIDQASIIAWFQAHTGELSNALKAWFPVLMKQGGNIVSTIGNLLLPPLLLYYFLLDWHRW 180 
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An ORF121ng nucleotide sequence <SEQ ID 787> was predicted to encode a protein having amino 
acid sequence <SEQ ID 788>: 

1 MYRR KGRGIK PWMGAGAAFA ALVWLVYALG DTL TPFAVAA VLAYVLDPLV 

51 F.WT iOTCKGLNR ASASMS VMVF SLILLLALLL IIV PMLVGQF NNLASRLPQL 

5 101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW FQAHTGELSN ALKAWFPVLM 

151 KOGGNIVS TI GNLLLPPLLL YYFLL DWHRW SCGIPKLVPR RFAGAYTRIT 

2 01 GNLNKVWGKF LRGQLLGETE RGAWCRVGR ECWEGGGARS RPSDDGWPRW 

2 51 GGG* 

Further work revealed the following gonoccocal DNA sequence <SEQ ID 789>: 

10 1 ATGTATCGGA GAAAAGGACG GGGCATCAAG CCGTGGATGG GTGCCGGCGC 

51 GGCGTTTGCC GCCTTGGTCT GGCTGGTTTA CGCGCTCGGC GATACTTTGA 

101 CTCCGTTTGC GGTTGCGGCG GTGCTGGCGT ATGTGTTGGA CCCTTTGGTC 

151 GAATGGTTGC AGAAAAAGGG TTTGAACCGT GCATCCGCTT CGATGTCTGT 

201 GATGGTGTTT TCCTTGATTT TGTTGTTGGC ATTATTGTTG ATTATTGTCC 

15 251 CTATGCTGGT CGGGCAGTTC AATAATTTGG CATCTCGCCT GCCCCAATTA 

301 ATCGGTTTTA TGCAGAACAC GCTGCTGCCG TGGTTGAAAA AT ACAAT CGG 

351 CGGATATGTG GAAATCGATC AGGCATCTAT TATTGCGTGG TTTCAGGCGC 

401 ATACGGGCGA GTTGAGCAAC GCGCTTAAGG CGTGGTTTCC CGTTTTGATG 

451 AAACAGGGCG GCAATAT TGT CAGCAGTATC GGCAACCTGC TGCTGCCGCC 

20 501 CTTGCTGCTT TACTATTTCC TGCTGGATTG GCAGCGGTGG T CGTGCGGCA 

551 TCGCCAAACT GGTTCCGAGG CGTTTTGCCG GTGCTTATAC GCGCATTACG 

601 GGTAATTTGA ACGAGGTATT GGGCGAATTT TTGCGCGGTC AGCTTCTGGT 

651 GATGCTGATT ATGGGCTTGG TTTACGGTTT GGGATTGATG CTAGTCGGAC 

701 TGGATTCGGG ATTTGCCATC GGTATGGTTG CCGGTATTTT GGTGTTTGTC 

25 751 CCCTATTTGG GTGCGTTTAC GGGATTGCTG CTTGCCACTG TTGCAGCCTT 

801 GCTCCAGTTC GGTTCGTGGA ACGGAATCTT GGCTGTTTGG GCGGTTTTTG 

851 CCGTCGGTCA GTTTCTCGAA AGTTTTTTCA TTACGCCGAA AATTGTAGGA 

901 GACCGTATCG GCCTGTCGCC GTTTTGGGTT ATCTTTTCGC TGATGGCGTT 

951 CGGAGAGCTG ATGGGCTTTG TCGGAATGTT GGCCGGATTG CCTTTGGCCG 

30 1001 CCGTAACCTT GGTCTTGCTT CGCGAGGGCG CGCAGAAATA TTTTGCCGGC 

1051 AGTTTTTACC GGGGCAGGTA G 

This corresponds to the amino acid sequence <SEQ ID 790; ORF121ng-l>: 

1 MYRRKGRGIK PWMGAGAAFA ALVWLVYALG DTL TPFAVAA VLAYVLDPLV 

51 EWLQKKGLNR ASASMS VMVF SLILLLALLL IIV PMLVGQF NNLASRLPQL 

35 101 IGFMQNTLLP WLKNTIGGYV EIDQASIIAW FQAHTGELSN ALKAWFPVLM 

151 KQGGNIVS SI GNLLLPPLLL YYFLL DWQRW SCGIAKLVFR RFAGAYTRIT 

201 GNLNEVLGEF LRGQL LVMLI MGLVYGLGLM LV GLDSGFAI GMVAG ILVFV 

251 PYLGAFTGLL LA TVAALLQF GSWNG ILAVW AVFAVGQ FLE SF FITPKIVG 

301 DRIGLSPFWV IFSLMAFGEL MG FVGMLAGL PLAAVTLVLL REGAQKYFAG 

40 351 SFYRGR* 

ORF121ng-l and ORF121-1 show 97.5% identity in 356 aa overlap: 

10 20 30 40 50 60 

orf 121-1 . pep MYRRKGRGIKPWMGAGAAFAALVWLVFALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 

! I M I I I I i I I I I I I I I I I I I : I I I I [ I I I I I I I I I I I I I I I I I M 11 I 1 1 I ! I I 

45 orfl21ng-l MYRRKGRGIKPWMGAGAAFAALVWLVYALGDTLTPFAVAAVLAYVLDPLVEWLQKKGLNR 

10 20 30 40 50 60 

70 80 90 100 110 120 

Orf 121-1 .pep ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 
50 I I I I I I I I I I I M M I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I 1 I I M I I I I I I I I 

orfl21ng-l ASASMSVMVFSLILLLALLLIIVPMLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYV 

70 80 90 100 110 120 

130 140 150 160 170 180 

55 orf 121-1 .pep E I DQAS I IAWLQAHTGELSNALKAWFPVLMRQGGNI VS S IGNLLLLPLLLYYFLL DWQRW 

I I I I I I I M I : I I I I I I I I I I I I I I I I I ! I = I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl21ng-l EIDQASIIAWFQAHTGELSNALKAWFPVLMKQGGNIVSSIGNLLLPPLLLYYFLLDWQRW 
130 140 150 160 170 180 



60 



orf 121-1. pep 



190 200 210 220 230 240 

SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLVLVGLDSGFAI 
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orf 121-1. pep 

orfl21ng-l 



| | [ | | | | | M M I I I I I I I I I i I I I I I I I II I I I I I 11111:1111111111 

SCGIAKLVPRRFAGAYTRITGNLNEVLGEFLRGQLLVMLIMGLVYGLGLMLVGLDSGFAI 
190 200 210 220 230 240 

250 260 270 280 290 300 

GMLAGILVFVPYLGAFTGLLLATVAALLQFGSWNGILSVWAVFAVGQFLESFFITPKIVG 
| | : | | | | | I I ! I I I I I I I I I II I I I I I I I I I ! I M I I : I I I I I I I I I I I I I I 1 I I I I I ! I 
GMVAGILVFVPYLGAFTGLLLATVAALLQFGSWNGILAVWAVFAVGQFLESFFITPKIVG 
250 260 270 280 290 300 



310 320 330 340 350 

orf 121-1 pep DRIGLSPFWVIFSLMAFGQLMGFVGMLAGLPLAAVTLVLLREGVQKYFAGSFYRGRX 
| | | | | I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I 1 I I : I I I I I I I I I I I I I 
orfl21ncr-l DRIGLSPFWVIFSLMAFGELMGFVGMLAGLPLAAVTLVLLREGAQKYFAGSFYRGRX 

310 320 330 340 350 

In addition, ORF121ng-l shows homology to a permease from H. influenzae: 

sp|P43 969|PERM_HAEIN PUTATIVE PERMEASE PERM HOMOLOG Length = 349 
Score =69.9 bits (168), Expect = 2e-ll 

Identities = 67/317 (21%), Positives = 120/317 (37%), Gaps = 7/317 (2%) 

Query: 2 6 VYALGDTLTPFAVAAVLAYVLDPLVEWL-QKKGLNRASASMSVMVFSXXXXXXXXXXXVP 84 

+Y GD + P +A VL+Y+L+ + +L Q R A++ + VP 

Sbjct: 32 IYFFGDLIAPLLIALVLSYLLEIPINFLNQYLKCPRMLATILIFGSFIGLAAVFFLVLVP 91 

Query: 85 MLVGQFNNLASRLPQLIGFMQNTLLPWLKNTIGGYVE-IDQASIIAWFQAfiTGELSNALK 143 
ML Q +L S LP + N WL N Y E ID + + + F + ++ + 

Sbjct: 92 MLWNQTISLLSDLPAMF NKSNEWLLNLPKNYPELIDYSMVDSIFNSVREKILGFGE 147 

Query: 14 4 AW FPVLMKQGGN IVS S I GNXXXXXXXXXXXXXDWQRWS CGI AKLVPRRFAGAYTRI TGNL 2 03 

+ + + N+VS D G+++ +P+ A+ R + 

Sbjct: 148 SAVKLSLASIMNLVSLGIYAFLVPLMMFFMLKDKSELLQGVSRFLPKNRNLAFXRWK-EM 206 

Query 204 NEVLGEFLRGQXXXXXXXXXXXXXXXXXXXXDSGFAIGMVAGILVFVPYXXXXXXXXXXX 2 63 

+ + ++ G+ + + G+ V VPY 

Sbjct: 207 QQQISNYIHGKLLEILIVTLITYIIFLIFGLNYPLLLAFAVGLSVLVPYIGAVIVTIPVA 266 

Query: 264 XXXXXQFGSWNGILAVWAVFAVGQFLESFFITPKIVGDRIGLSPFWVIFSLMAFGELMGF 323 

QFG + FAV QL+ +P+ ++LP +1 S++ FG L GF 

Sbjct: 267 LVAL FQFGI S PT FWYI 1 1 AFAVSQLLDGNLLVPYL FSEAVNLHPL III I S VLI FGGLWGF 326 

Query: 324 VGMLAGL P L AAV T LVL L 34 0 

G+ +PLA + ++ 
Sbjct: 327 WGVFFAI PLATLVKAV I 343 

Based on this analysis, including the presence of a putative leader sequence and transmembrane 
domains in the two proteins, it is predicted that the proteins from N. meningitidis and 
N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 



Example 94 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 79 1>: 

1 , , ACTGCTTTTT CGGCGGCGCT GCGCTTGAGT CCATCATGAC T C GT CAT AT T 
51 TTTGTCCTTT GGGAAACCGT AT CAACAAAC AGCCGCCATC TTAACATTTT 

101 TTTGCACGTC CTGCCCGCCG CGTTCAAATG CGTACCAGCA ATACCGCCGC 
151 CTGCGCCTCT ATGCCTTCCA TCCGCCCGAG ATAGCCGAGT TTTTCGTTGG 

2 01 TTTTGCCTTT GATGTTGACG CACGAAATGT CTATGCCCAA ATCGGCGGCG 
251 ATGTTGGCAC GCATTTGCGG AATGTGCGGC GCGAGTGTGG GTTTCTGTGC 

3 01 AATCACGGTC GTAT CGACAT TGACCGCCTG CCAACCCTGC GCCTGAACGC 

3 51 TTTGATACGC CGCACGCAAA AGGACGCGGC TGTCCGCATC TTTGAACTCT 

4 01 GCGGCGGTGT CGGGGAAATG GCTGCCGATA TCGCCCAAAC CTGCCGCACC 
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4 51 GAGCAGCGCG TCGGTAACGG CGTGCAGCAG CGCATCGGCA TCGGAGTGTC 
501 CGAGCAGCCC TTTTTCAAAT GGGATTTCAA CTCCGCCAAG TATCAG . . 

This corresponds to the amino acid sequence <SEQ ID 792; ORF122>: 

1 . . TAFSAALRLS PSXLVIFLSF GKPYQQTAAI LTFFCTSCPP RSNAYQQYRR 

5 51 LRLYAFHPPE IAEFFVGFAF DVDARNVYAQ IGGDVGTHLR NVRRECGFLC 

101 NHGRIDIDRL PTLRLNALIR RTQKDAAVRI FELCGGVGEM AADIAQTCRT 

151 EQRVGNGVQQ RIGIGVSEQP FFKWDFNSAK YQ. . 

Further work revealed the complete nucleotide sequence <SEQ ID 793>: 

1 ATATCGTACT GGGCAAGCAG TTCGCCGGAT TTTTTGGAAG TAGATACCGC 

10 51 GCCTTTGATT TTTTTGCCGC TCTTACCCAA GGCTTCGATG AAAAAGTTGA 

101 TGGTCGAGCC GGTACCGATG 'CCGATATATT CATTTTCGGG TACGAATTCG 

151 ACTGCTTTTT CGGCGGCGAT GCGCTTGAGT TCGTCTTGTG TCGTCATATT 

201 TTTGTCCTTT GGGAAAC CGT ATCAACAAAC AGCCGCCATC TTAACATTTT 

251 TTTGCACGTC CTGCCCGCCG CGTTCAAATG CGTACCAGCA ATACCGCCGC 

15 301 CTGCGCCTCT ATGCCTTCCA TCCGCCCGAG ATAGCCGAGT TTTTCGTTGG 

351 TTTTGCCTTT GATGTTGACG CACGAAATGT CTATGCCCAA ATCGGCGGCG 

401 ATGTTGGCAC GCATTTGCGG AATGTGCGGC GCGAGTTTGG GTTTCTGTGC 

451 AATCACGGTC GTATCGACAT TGACCGCCTG CCAACCCTGC GCCTGAACGC 

501 TTTGATACGC CGCACGCAAA AGGACGCGGC TGTCCGCATC TTTGAACTCT 

O 20 551 GCGGCGGTGT CGGGGAAATG GCTGCCGATA TCGCCCAAAC CTGCCGCACC 

601 GAGCAGCGCG TCGGTAACGG CGTGCAGCAG CGCATCGGCA TCGGAGTGTC 

651 CGAGCAGCCC TTTTTCAAAT GGGATTTCAA CTCCGCCAAG TATCAGCTTT 

701 CTGCCTTCGG TCAGTTGGTG GACATCGTAG CCCTGTCCGA TACGGATGTT 

751 CGTCATCGTT TGTGTTCCTG A 

25 This corresponds to the amino acid sequence <SEQ ID 794; ORF122-l>: 

1 ISYWASSSPD FLEVDTAPLI FLPLLPKASM KKLMVE PVPM PIYSFSGTNS 
51 T AFSAAMRLS SSCVVIFL SF GKPYQQTAAI LTFFCTSCPP RSNAYQQYRR 
101 LRLYAFHPPE IAEFFVGFAF DVDARNVYAQ IGGDVGTHLR NVRREFGFLC 
151 NHGRIDIDRL PTLRLNALIR RTQKDAAVRI FELCGGVGEM AADIAQTCRT 
30 201 EQRVGNGVQQ RIGIGVSEQP FFKWDFNSAK YQLSAFGQLV DIVALSDTDV 

251 RHRLCS* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from 7Y. meningitidis (strain A) 

ORF122 shows 94.0% identity over a 182aa overlap with an ORF (ORF122a) from strain A of TV. 
35 meningitidis: 

10 20 30 

orf 122 .pep TAFSAALRL S P SXLVI FLS FGKPYQQTAAI 

I I I I I I : I I I I : I I I I I I I I I I I I I I I I 
orf 122a FLPLLPKASMKKLMVEPVPMPMYSFSGTNSTAFSAAMRLSSSCWIFLSFGKPYQQTAAI 



or f 122 . pep LTFFCTSCPPRSNAYQQYRRLRLYAFHPPEIAEFFVGFAFDVDARNVYAQIGGDVGTHLR 



100 110 120 130 140 150 

NVRRECGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRIFELCGGVGEMAADIAQTCRT 
1:111 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I t I t I i I I I I M I I I I I I I II I 
NMRREFGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRIFELCGGVGEMAADIAQTCRT 

150 160 170 180 190 200 

160 170 180 

EQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQ 
I I II I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I 

EQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQLSAFGQLVDIVALSDTDVRHRLCSX 
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The complete length ORF 122a nucleotide sequence <SEQ ID 795> is: 



101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 



AT AT CAT AT T 
GCCTTTGATT 
TGGTCGAACC 
ACTGCNTTTT 
TTTGTCCTTT 
TTNNNACGTC 
CTGCGACTCT 
TTTTGCCTTT 
ATGTTGGCAC 
AATCACGGTC 
TTTGATACGC 
GCGGCGGTGT 
GAGCAGCGCG 
CGAGCAGCCC 
CTGCCTTCGG 
CGTCATCGTT 



GGGCAAGCAG 
TTTTTGCCGC 
GGTACCGATG 
CGGCGGCGAT 
GGGAAACCGT 
CTGCCCGCCG 
ATGCCTTCCA 
GANGTTGACG 
GCATTTGCGG 
GTATCGACAT 
CGCACGCAAA 
CGGGGAAATG 
TCGGTAACGG 
TTTTTCAAAT 
TCAGTTGGTG 
TGTGTTCCTG 



TTCACTGGAT 
TCTTACCCAA 
CCGATGTATT 
GCGCTTGAGT 
AT C AACAAAC 
CGTTCAAATC 
TGCGCCCGAG 
CACGAAATGT 
AATATGCGGC 
TGACCGCCTG 
AGGACGCGGC 
GCTGCCGATA 
CGTGCAGCAG 
GGGATTTCAA 
GACATCGTAG 



TTTTTGGAAG 
GGCTTCGATG 
CGTTTTCGGG 
TCGTCTTGTG 
AGCCGCCATC 
CTTACCAGCA 
ATAACCGAGT 
CTATGCCCAA 
GCGAGTTTGG 
CCAACCCTGC 
TGTCCGCATC 
TCGCCCAAAC 
CGCATCGGCA 
CTCCGCCAAG 
CCCTGTCCGA 



TAGATACCGC 
AAAAAGTTGA 
TACGAATTCG 
TCGTCATATT 
TTAACATTTT 
ATACCGCCGC 
TTTTCGTTGG 
ATCGGCGGCG 
GTTTCTGTGC 
GCCTGAACGC 
TTTGAACTCT 
CTGCCGCACC 
TCGGAGTGTC 
TATCAGCTTT 
TACGGATGTT 



This encodes a protein having amino acid sequence <SEQ ID 796>: 

1 ISYWASSSLD FLEVDTAPLI FLPLLPKASM KKLMVEPVPM PMYSFSGTNS 

51 T AFSAAMRLS SSCWIFL SF GKPYQQTAAI LTFFXTSCPP RSNPYQQYRR 

101 LRLYAFHAPE ITEFFVGFAF XVDARNVYAQ IGGDVGTHLR NMRREFGFLC 

151 NHGRIDIDRL PTLRLNALIR RTQKDAAVRI FELCGGVGEM AAD I AQT CRT 

201 EQRVGNGVQQ RIGIGVSEQP FFKWDFNSAK YQLSAFGQLV DIVALSDTDV 

251 RHRLCS* 

ORF122a and ORF122-1 show 96.9% identity in 256 aa overlap: 



40 



ISYWASSSLDFLEVDTAPLIFLPLLPKASMKKLMVEPVPMPMYSFSGTNSTAFSAAMRLS 
I | i | | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I : I I I I I I I I I I I I I I M M 
I SYWASSSPD FLEVDTAPLI FLPLLPKASMKKLMVEPVPMP I YS FSGTNSTAFS AAMRLS 



10 



20 



30 



40 



50 



60 



70 80 90 100 110 120 

orf 122a . pep SSCWIFLSFGKPYQQTAAILTFFXTSCPPRSNPYQQYRRLRLYAFHAPEITEFFVGFAF 
I | i M I II I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I i I I I 
orf 122-1 SSCWIFLSFGKPYQQTAAILTFFCTSCPPRSNAYQQYRRLRLYAFHPPEIAEFFVGFAF 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 122a . pep XVDARNVYAQIGGDVGTHLRNMRREFGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRI 



190 200 210 220 230 240 

FELCGGVGEMAADIAQTCRTEQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQLSAFGQLV 
I I I I 1 I 1 I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I M I I I I I I I I I I I I I I I I I I 
FELCGGVGEMAADIAQTCRTEQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQLSAFGQLV 

190 200 210 220 230 240 



Homology with a predicted ORF from N. gonorrhoeae 

ORF 122 shows 89.6% identity over a 182 aa overlap with a predicted ORF (ORF122ng) from 
N. gonorrhoeae: 
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orf 122 .pep 
orfl22ng 



TAFS AALRL S P S XLVI FLS FGKP YQQT AAI 
111111:111 I : II I I I I 1 I I I I I ! I I I 
FLPLLPKASMKKLMVEPVPMPMYSFSGTNSTAFSAAMRLSSSCVVIFLSFGKPYQQTAAI 

LTFFCTSCPPRSNAYQQYRRLRLYAFHPPEIAEFFVGFAFDVDARNVYAQIGGDVGTHLR 
I I I I I I I Mill I I I II I I 1 I I I I I I I II I I I 1 I II I I I : I I I I : : I II I I I I I I I I 
LTFFCTSWPPRSNPYQQYRRLRLYAFHPPEIAEFFVGFAFDIDARNIDTQIGGDVGTHLR : 

orf 122 .pep NVRRECGFLCNHGRIDIDRLPTLRLNALIRRTQKDAAVRIFELCGGVGEMAADIAQTCRT : 

III I I M M I M I I II : II I M I I I I I I I I I I I I I I II I I I I II II : I I I I : I I I I I I 
orfl22ng NVRCE FGFLCNHGRIDIDHLPTLRLNALIRRTQKDAAVRIFELCGGVGKMAADVAQTCRT : 

orfl22.pep EQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQ 182 

I I I I I I I I I II : I I : I I I I I I I I M I II I I 
orf 122ng EQRVGNGVQQRVGIRMPEQPFFKWDFNSAKYQLSAFGQLVDIVALSDTDIRHRLCS 2 5 6 

The complete length ORF122ng nucleotide sequence <SEQ ID 797> is: 



101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 



ATGTCGTACC 
GCCTTTGATT 
tgGTCGAACC 
ACTGCTTTTT 
TTTAtccttt 
TTTGCACGtc 
ctgcgcctCT 
TTTTGCCTTT 
ATGTTGGCAC 
AATCACGGTC 
TTTGATACGC 
GCGGCGGTGT 
GAGCAGCgcg 

CGAGCAGCCC 
CTGCCTTCGG 
CGTCATCGTT 



GGGCAAGCAG 
TTTTTACCGC 
GgtaCCGATG 
CGGCGGCGAT 
gGGAAaccct 
ctggccgccg 
AtgcCTTCCA 
GATatTGACG 
GCATTTGCGG 
GTATCGACAT 
CGCACGCAAA 
CGGGAAAATG 

tcggtaaCGG 

TTTTTCAAAT 
TCAATTGGTG 
TGTGTTCCTG 



TTCGCCGGAT 
TTTTGCCCAA 
CCGATGTATT 
GCGCttgAgt 
atcaAcaAAc 
cgttcaAATc 
TCCGCCCGAG 
CACGAAATAT 
AATGTGCGGT 
TGACCACCTG 
AGGACGCGGC 
GCTGCCGATG 
CGTGCAGCAG 
GGGATTTCAA 
GACATCGTAG 



TTTTTGGAGG 
GGCTTCGATG 
CGTTTTCGGG 
TCgtcttgcg 
agccgccatC 
cgtaccaGca 
ATAGCCGAGT 
CGatacCCAa 
GCGAGTTTGG 
CCAACCCTGC 
TGTCCGCATC 
TCGCCCAAAC 

cgcgTcgGCA 

CTCCGCCAAG 
CCCTGTCCGA 



TTGAAACCGC 
AAGAAATTGa 
TACGAATTCG 
TcgTCATATT 
TTAACATTTT 
ataccgccgc 
TTTTCGTTGG 
atcggcgGCG 

GTTTCTGTGC 
GCCTGAACGC 
TTTGAACTCT 
CTGCCGCACC 
TCCGAATGCC 
TATCAGCTTT 
TACGGATATT 



This encodes a protein having amino acid sequence <SEQ ID 798>: 

1 MSYRASSSPD FLEVETAPLI FLPLLPKASM KKLMVEPVPM PMYSFSGTNS 

35 51 T AFSAAMRLS SSCWIFL SF GKPYQQTAAI LTFFCTSWPP RSNPYQQYRR 

101 LRLYAFHPPE IAE FFVG FAF DIDARNIDTQ IGGDVGTHLR NVRCE FGFLC 

151 NHGRIDIDHL PTLRLNALIR RTQKDAAVRI FELCGGVGKM AADVAQTCRT 

2 01 EQRVGNGVQQ RVGIRMPEQP FFKWDFNSAK YQLSAFGQLV DIVALSDTDI 

251 RHRLCS* 

40 ORF122ng and ORF122-1 show 92.6% identity in 256 aa overlap: 



I SYWASSSPDFLEVDTAPL I FLPLLPKASMKKLMVEPVPMPIYSFSGTNS TAFSAAMRLS 

: i I i I I I I M II : I I M II I M II II I I II I II I M M I : I M I I M II I I I M I I II 

MSYRAS S S PDFLEVETAPL I FLPLLPKASMKKLMVE PVPMPMYS FSGTNSTAFS AAMRLS 
30 40 50 60 



20 



70 80 90 100 110 120 

orf 122-1. pep SSCWIFLSFGKPYQQTAAILTFFCTSCPPRSNAYQQYRRLRLYAFHPPEIAEFFVGFAF 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I || II I I I I I I I I 
orfl22ng SSCWIFLSFGKPYQQTAAILTFFCTSWPPRSNPYQQYRRLRLYAFHPPEIAEFFVGFAF 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 122-1 . pep DVDARNVYAQIGGDVGTHLRNVRRE FGFLCNHGRIDIDRLPTLRLNAL IRRTQKDAAVRI 
I : I I I I : : I M I I I I I I I I I I I I I I I I I I I M I M 1 : I I M I I I I I I I I I I I I I I I I I 
orfl22ng DI DARN IDT Q IGGDVGTHLRNVRCE FG FLCNHGRI D I DHL PTLRLNAL I RRT QKDAAVRI 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 122-1 .pep FELCGGVGEMAADIAQTCRTEQRVGNGVQQRIGIGVSEQPFFKWDFNSAKYQLSAFGQLV 

M II M M : I I I I : I M I I I II II M I M I I : II : I IN 1 M I I M M M 11 II I I I I 
orfl22ng FELCGGVGKMAADVAQTCRTEQRVGNGVQQRVGIRMPEQPFFKWDFNSAKYQLSAFGQLV 
190 200 210 220 230 240 
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orf 122-1. pep 
orfl22ng 



Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 95 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 799>: 

GCCGGCGCGA GTGCGAACAA CATTTCCGCG CGTTTTGCGG AAACACCCGT 
CGCTGTCAGC GTTACCCTGA TCGGCACGGT ACTTGCCGTC ATGCTGCCCG 
TTACCGAATA TGAAAACTTC CTGCTGCTTA TCGGCT CGGT ATTTGCGCCG 
ATGgGGCGGA TJTTTGATTGC CGACTTTTTC GTCTTGAAAC GGCGTGA 

This corresponds to the amino acid sequence <SEQ ID 800; ORF125>: 

1 . .AGASmmSA RFAETPVAVS VTLIGTVLAV MLPVTEYENF LLLIGSVFAP 
51 MGGFDCRLFR LETA* 

Further work revealed the complete nucleotide sequence <SEQ ID 801>: 

1 ATGTCGGGCA ATGCCTCCTC TCCTTCATCT TCCTCCGCCA TCGGGCTGAT 

51 TTGGTTCGGC GCGGCGGTAT CGATTGCCGA AATCAGCACG GGTACGCTGC 

101 TTGCGCCTTT GGGCTGGCAG CGCGGTCTGG CGGCTCTACT TTTGGGTCAT 

151 GCCGTCGGCG GCGCGCTGTT TTTTGCGGCG GCGTATATCG GCGCACTGAC 

201 CGGACGCAGC TCGATGGAAA GCGTGCGCCT GTCGTTCGGC AAACGCGGTT 

251 CAGTGCTGTT TTCCGTGGCG AATATGCTGC AACTGGCCGG CTGGACGGCG 

301 GTGATGATTT ACGCCGGCGC AACGGTCAGC TCCGCTTTGG GCAAAGTGTT 

351 GTGGGACGGC GAATCTTTTG TCTGGTGGGC ATTGGCAAAC GGCGCGCTGA 

401 TTGTGCTGTG GCTGGTTTTC GGCGCACGCA AAACAGGCGG GCTGAAAACC 

451 GTTTCGATGC TGCTGATGCT GTTGGCGGTT CTGTGGCTGA GTGCCGAAGT 

501 CTTTTCCACG GCAGGCAGCA CCGCCGCACA GGTTTCAGAC GGCATGAGTT 

551 TCGGAACGGC AGTCGAGCTG TCCGCCGTGA TGCCGCTTTC CTGGCTGCCG 

601 CTTGCCGCCG ACTACACGCG CCACGCGCGC CGCCCGTTTG CGGCAACCCT 

651 GACGGCAACG CTCGCCTACA CGCTGACCGG CTGCTGGATG TATGCCTTGG 

701 GTTTGGCAGC GGCGTTGTTC ACCGGAGAAA CCGACGTGGC AAAAAT CCTG 

751 CTGGGCGCAG GTTTGGGTGC GGCAGGCATT TTGGCGGTCG TCCTCTCCAC 

801 CGTTACCACA ACGTTTCTCG ATGCCTATTC CGCCGGCGCG AGTGCGAACA 

B51 ACATTTCCGC GCGTTTTGCG GAAACACCCG TCGCTGTCGG CGTTACCCTG 

901 ATCGGCACGG TACTTGCCGT CATGCTGCCC GTTACCGAAT AT GAAAACTT 

951 CCTGCTGCTT ATCGGCTCGG TATTTGCGCC GATGGCGGCG GTTTTGATTG 

10 01 CCGACTTTTT CGTCTTGAAA CGGCGTGAGG AGATTGAAGG CTTTGACTTT 

1051 GCCGGACTGG TTCTGTGGCT TGCGGGCTTC ATCCTCTACC GCTTCCTGCT 

1101 CTCGTCCGGC TGGGAAAGCA GCATCGGTCT GACCGCCCCC GTAATGTCTG 

1151 CCGTTGCCAT TGCCACCGTA TCGGTACGCC TTTTCTTTAA AAAAACCCAA 

12 01 TCTTTACAAA GGAACCCGTC ATGA 

This corresponds to the amino acid sequence <SEQ ID 802; ORF125-l>: 



1 


MSGNASSPSS 


SSAIGLIWFG 


AAVSIAEIST 


GTLLAPLGWQ 


RGLAALLLGH 


51 


AVGGALFFAA 


AYIGALTGRS 


SMESVRLSFG 


KRGSVLFSVA 


NMLQLAGWTA 


101 


VMIYAGATVS 


SALGKVLWDG 


ESFVWWALAN 


GALIVLWLVF 


GARKTGGLKT 


151 


VSMLLMLLAV 


LWLSAEVFST 


AGSTAAQVSD 


GMSFGTAVEL 


SAVMPLSWLP 


201 


LAADYTRHAR 


RPFAATLTAT 


LAYTLTGCWM 


YALGLAAALF 


TGETDVAKIL 


251 


LGAGLGAAGI 


LAWLSTVTT 


TFLDAYS-flGA 


SANNISARFA 


ETPVAVGVTL 


301 
351 
401 


IGTVLAVMLP 


VTEYENFLLL 


IGSVFAPMAA 


VLIADFFVLK 


RREEIEGFDF 
SVRLFFKKTQ 


AGLVLWLAGF 


ILYRFLLSSG 


WESSIGLTAP 


VMSAVAIATV 


SLQRNPS* 











51 
101 
151 



Computer analysis of this amino acid sequence gave the following results: 
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Homology with a predicted ORF from N. meningitidis (strain A) 

ORF125 shows 76.5% identity over a 51aa overlap with an ORF (ORF125a) from strain A ofN. 



10 20 30 

AGASANNI S ARFAET PVAVSVTL IGTVLAV 
I 1 : I I I I I I I : : : I I : I I : I : : : I I : I I I 
KILLGAGLGAAGILAVVLSTVTTTFLDAYSAGVSANNISAKLSEIPIAVAVAWGTLLAV 
250 260 270 280 290 300 

40 50 60 

MLPVTEYENFLLLIGSVFAPMGGFDCRLFRLETAX 
: I I I I I I I I I I I I I I I I I I I i : 

LLPVTEYENFLLLIGSVFAPMAAVLIADFFVLKRREEIEG 
310 320 330 340 

The ORF125a partial nucleotide sequence <SEQ ID 803> is: 

1 ATGTCGGGCA ATGCCTCCTC TCNTTCATCT TCCGCCGCCA TCGGGCTGAT 

51 TTGGTTCGGC GCGGCGGTAT CGATTGCCGA AATCAGCACG GGTACACTGC 

101 TTGCGCCTTT GGGCTGGCAG CGCGGTCTGG CNGCTCTGCT TTTGGGTCAT 

151 GCCGTCGGCG GCGCGCTGTT TTTTGCGGCG GCGTATATCG GCGCACTGAC 

201 CGGACNCANC TCGATGGAAA GCGTGCGCCT GTCGTTCGGC AAACGCGGTT 

251 CAGTGCTGTT TTCCGTGGCG AATATGCTGC AACTGGCCGG CTGGACGGCG 

301 GTGATGATTT ACGCCGGCGC AACGGTCAGC TCCGCTTTGG GCAAAGTGTT 

351 GTGGGACGGC GAATCTTTTG TCTGGTGGGC ATTGGCAAAC GGCGCGCTGA 

401 TTGTGCTGTG GCTGGTTTTC GGCGCACGCA AAACAGGCGG GCTGAAAACC 

451 GTTTCGATGC TGCTGATGCT GTTGGCGGTT CTGTGGCTGA GTGCCGAANT 

501 NTTTTCCACG GCAGGCAGCA CCGCCGCANN GGTNNCAGAC GGCATGAGTT 

551 TCGGAACGGC AGTCGAGCTG TCCGCCGTNA TGCCGCTTTC TTGGCTGCCG 

601 CTGGCCGCCG ACTACACGCG CCACGCGCGC CGCCCGTTTG CGGCAACCCT 

651 GACGGCAACG CTCGCCTACA CGCTGACCGG CTGCTGGATG TATGCCTTGG 

7 01 GTTTGGCAGC GGCGTTGTTC ACCGGAGAAA CCGACGTGGC AAAAATCCTG 

7 51 CTGGGCGCAG GTTTGGGTGC GGCAGGCATT TTGGCGGTCG TCCTGTCGAC 

801 CGTTACCACC ACTTTTCTCG ATGCNTACTC CGCCGGCGTA AGTGCCAACA 

851 ATATTTCCGC CAAACTTTCG GAAATACCNA TCGCCGTTGC CGTCGCCGTT 

901 GTCGGCACAC TGCTTGCCGT CCTCCTGCCC GTTACCGAAT ATGAAAACTT 

951 CCTGCTGCTT ATCGGCTCGG TATTTGCGCC GATGGCGGCG GTTTTGATTG 

1001 CCGACTTTTT CGTCTTGAAA CGGCGTGAGG AGATTGAAGG C. . 

This encodes a protein having the partial amino acid sequence <SEQ ID 804>: 

1 MSGNASSXSS SAAIGLIWFG AAVSIAEIST GTLLAPLGWQ RGLAALLLGH 

51 AVGGA LFFAA AYIGALTGXX SMESVRLSFG KRGSVLFSVA NMLQLAGWTA 

101 VMIYAGATVS SALGKVLWDG ES FVWWALAN GALIVLWLV F GARKTGGLKT 

151 VS MLLMLLAV LWLSAEXF ST AGSTAAXVXD GMSFGTAVEL SAVMPLSWLP 

201 LAADYTRHAR RPFAATLTAT LAYTLTGCWM YALGLAAALF TGETDVAKIL 

251 LGAGLGAAGI LAWL STVTT TFLDAYSAGV SANNISAKLS E IPIAVAVAV 

301 VGTLLAVL LP VTEYEN FLLL IGSVFAPMAA VLI ADFFVLK RREEIEG. . 



ORF125a and ORF125-1 show 94.5% identity in 347 aa overlap: 



meningitidis: 

orfl25.pep 
orfl25a 

orfl25.pep 
orfl25a 



10 20 30 40 50 60 

orf 125a. pep MSGNASSXSSSAAIGLIWFGAAVSIAEISTGTLLAPLGWQRGLAALLLGHAVGGALFFAA 
I I I I I I I I I I : I M I I I I 1 I I I I I I I M I I I I I I I I I I I I I I I 1 I i I I I I II I I I I I I I 
orf 125-1 MSGNASSPSSSSAIGLIWFGAAVSIAEISTGTLLAPLGWQRGLAALLLGHAVGGALFFAA 

10 20 30 40 50 60 

70 80 90 100 110 120 

or f 125a . pep AYIGALTGXXSMESVRLSFGKRGSVLFSVANMLQLAGWTAVMIYAGATVS SALGKVLWDG 
I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I M I 1 I 1 1 I I 
orf 125-1 AYIGALTGRSSMESVRLSFGKRGSVLFSVANMLQLAGWTAVMIYAGATVSSALGKVLWDG 

70 80 90 100 110 120 



orfl25a.pep 



130 140 150 160 170 180 

ESFVWWALAN GALIVLWLVFGARKTGGLKTVSMLLMLLAVLWLSAEXFSTAGSTAAX VXD 
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M I I I I 1 I 1 I I I I I I I I M I I I I II I I I I I ! M I I I M I I I I I I Ml I I 

O r f 1 2 5 - 1 ES FWWALANGAL I VLWLVFGARKTGGLKTVSMLLMLLAVLWLSAEVFSTAGSTAAQVSD 

130 140 150 160 170 180 

190 200 210 220 230 240 

orfl25a pep GMSFGTAVELSAVMPLSWLPLAADYTRHARRPFAATLTATLAYTLTGCWMYALGLAAALF 

| | | | | | | | I I I I I I I I I M I I I I I I I I I I I M I I I 1 I 1 I 1 I I I I I 

orf 125-1 GMSFGTAVELSAVMPLSWLPLAADYTRHARRPFAATLTATLAYTLTGCWMYALGLAAALF 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 125a pep TGETDVAKILLGAGLGAAGILAWLSTVTTTFLDAYSAGVSANNISAKLSEIPIAVAVAV 
| | I I I I I I I I ! I I I I I I I II I I I I I I I I I I i I I I I I I I I : M I II ! I ::: I I : I I = I = = 
orf 12 5-1 TGETDVAKILLGAGLGAAGILAWLSTVTTTFLDAYSAGASANNISARFAETPVAVGVTL 

250 260 270 280 290 300 

310 320 330 340 

orf 125a. pep VGTLLAVLLPVTEYENFLLLIGSVFAPMAAVLIADFFVLKRREEIEG 
: I I :! I !: I I I I I I M I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 
orf 125-1 IGTVLAVMLPVTEYENFLLLIGSVFAPMAAVLIADFFVLKRREEIEGFDFAGLVLWLAGF 

310 320 330 340 350 360 



Homology with a predicted ORF from N.gonorrhoeae 

ORF125 shows 86.2% identity over a 65aa overlap with a predicted ORF (ORF125ng) from 
N.gonorrhoeae: 

orf 12 5 pep AGASANNISARFAETPVAVSVTLIGTVLAV 3 0 

I I I I I I I I I I I I I I 1111:1111 I I I I I 
orf 12 5ng KILLGAGLGITGILAVVLSTVTTTFLDTYSAGASANNISARFAEIPVAVGVTLIRTVLAV 30 8 

orf 125. pep MLPVTEYENFLLLIGSVFAPM-GGFDCRLFRLETA 64 

I I I I I I I : I II I I I I M : I I II I I I I I I I : I I 
orfl25ng MLPVTEYKNFLLLIRSVFGPMAGGFDCRL FCLKTA 343 

An ORF125ng nucleotide sequence <SEQ ID 805> was predicted to encode a protein having amino 
acid sequence <SEQ ID 806>: 

1 MSGNASSPSS SAAIGLVWFG AAVSIAEIST GTLLAPLGWQ RGLAALLLGH 

51 AVGG ALFFAA AYIGALTGRS SMESVRLSFG KCGSVLFSVA NMLQLAGWTA 

101 VMIYVGATVS SALGKVLWDG ES FVWWALAN GALIVLWLV F GARRTGGLKT 

151 VS MLLMLLAV LWLSVEVFA S SGTNAAPAVS DGMTFGTAVE LSAVMPLSWL 

201 PLAADYTRQA RRPFAATLTA TLAYTLTGCW MYALGLAAAL FTGETDVAKI 

2 51 LLGAGLGITG ILAWL STVT TTFLDTYSAG ASANNISARF AE IPVAVGVT 

301 LIRTVLAVM L PVTEYKNFLL LIRSVFGPMA GGFDCRLFCL KTA* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 807>: 



1 ATGTCGGGCA ATGCCTCCTC TCCTTCATCT TCCGCCGCCA TCGGGCTGGT 

51 TTGGTTCGGC GCGGCGGTAT CGATTGCCGA AATCAGCACG GGTACGCTGC 

101 TCGCCCCCTT GGGCTGGCAG CGCGGTCTGG CGGCCCTGCT TTTGGGTCAT 

151 GCCGTCGGCG GCGCGCTGTT TTTTGCGGCG GCGTATATCG GCGCACTGAC 

2 01 CGGACGCAGC TCGATGGAAA GTGTGCGCCT GTCGTTCGGC AAATGCGGTT 

2 51 CAGTGCTGTT TTCCGTGGCG AATATGCTGC AACTGGCCGG CTGGACGGCG 
301 GTGATGATTT ACGTCGGCGC AACGGTCAGC TCCGCTTTGG GCAAAGTGTT 

3 51 GTGGGACGGC GAATCCTTTG TCTGGTGGGC ATT GGCAAAC GGCGCACTGA 

4 01 TCGTGCTGTG GCTGGTTTTC GGCGCACGCA GAACGGGCGG GCTGAAAACC 
4 51 GTTTCGATGC TGCTGATGCT GCTTGCCGTG TTGTGGTTGA GCGTCGAAGT 
501 GTTCGCTTCG TCCGGCACAA ACGCCGCGCC CGCCGTTTCA GACGGCATGA 
551 CCTTCGGAAC GGCAGT CGAA CTGTCCGCCG TCATGCCGCT TTCCTGGCTG 
601 CCGCTGGCCG CCGACTACAC GCGCCAAGCA CGCCGCCCGT TTGCGGCAAC 
651 CCTGACGGCA ACGCTCGCCT ATACGCTGAC GGGCTGCTGG ATGTATGCCT 
7 01 TGGGTTTGGC GGCGGCTCTG TTTACCGGAG AAACCGACGT GGCGAAAATC 

7 51 CTGTTGGGCG CGGGCTTGGG CATAACGGGC ATTCTGGCAG TCGTCCTCTC 

8 01 CACCGTTACC ACAACGTTTC TCGATACCTA TTCCGCCGGC GCGAGTGCGA 
8 51 AC AAC AT T T C CGCGCGTTTT GCGGAAATAC CCGTCGCTGT CGGCGTTACC 
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901 CTGATCGGCA CGGTGCTTGC CGTCATGCTG CCCGTTACCG AATATAAAAA 

951 CTTCCTGCTG CTTATCGGCT CGGTATTTGC GCCGATGGCG GCGGTTTTGA 

1001 TTGCCGACTT TTTCGTCTTA AAACGGCGTG AGGAGATTGA AGGCTTTGAC 

1051 TTTGCCGGAC TGGTTCTGTG GCTGGCAGGC TTCATCCTCT ACCGCTTCCT 

1101 GCTCTCGTCC GGTTGGGAAA GCAGCATCGG TCTGACCGCC CCCGTAATGT 

1151 CTGCCGTTGC CATTGCCACC GTATCGGTAC GCCTTTTCTT TAAAAAAACC 

1201 CAATCTTTAC AAAGGAACCC GTCATGA 

This corresponds to the amino acid sequence <SEQ ID 808; ORF125ng-l>: 

1 MSGNASSPSS SAAIGLVWFG AAVSIAEIST GTLLAPLGWQ RGLAALLLGH 



101 
151 
201 
251 
301 
351 
401 



AVGG ALFFAA AYIGALTGRS 
VMIYVGATVS SALGKVLWDG 
VS MLLMLLAV LWLSVEVFA S 
PLAADYTRQA RRPFAATLTA 
LLGAGLGITG ILAWL STVT 
LIGTVLAVM L PVTEYKN FLL 
F AGLVLWLAG FILYRFLL SS 
QSLQRNPS* 



SMESVRLSFG KCGSVLFSVA 
ES FVWWALAN GALIVLWLV F 
SGTNAAPAVS DGMT FGTAVE 
TLAYTLTGCW MYALGLAAAL 
TTFLDTYSAG ASANNISARF 
LIGSVFAPMA AVLIA DFFVL 
GWESSIGLTA PVMSAVAIAT 



NMLQLAGWTA 
GARRTGGLKT 
LSAVMPLSWL 
FTGETDVAKI 
AEIPVAVGVT 
KRREEIEGFD 
VSVRLFFKKT 



ORF125ng-l and ORF 125-1 show 95.1% identity in 408 aa overlap: 



orfl25-l.pep 



rf 125-1. pep 
rfl25ng-l 



orf 125-1. pep 
orfl25ng-l 



MSGNASSPSSSSAIGLIWFGAAVSIAEISTGTLLAPLGWQRGLAALLLGHAVGGALFFAA 
I I I I I I I I I II : I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I II 
MSGNAS S P S S SAAIGLVWFGAAVS I AE I S TGTLLAPLGWQRGLAALLLGHAVGGALFFAA 



10 



20 



30 



40 



50 



60 



orf 125-1 .pep 



orf 125-1 .pep 



70 80 90 100 110 120 

AYIGALTGRS SMESVRLSFGKRGSVLFSVANMLQLAGWTAVMIYAGATVS SALGKVLWDG 
I II I I I I I I I I I M I I I I I I I I M I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I II I I I 
AYIGALTGRS SMESVRLSFGKCGSVLFSVANMLQLAGWTAVMIYVGATVS SALGKVLWDG 

70 80 90 100 110 120 

130 140 150 160 170 179 

ESFVWWALAN GAL IVLWLVFGARKTGGLKTVSMLLMLLAVLWLSAEVFSTAGSTAAQ-VS 
I I I I I I II I I I I I I I I I I I I I I I : I I I I I I I I I I I I I II II I I I : I I I : : : I : : I I II 
ESFVWWALAN GALIVLWLVFGARRTGGLKTVSMLLMLLAVLWLSVEVFASSGTNAAPAVS 

130 140 150 160 170 180 

180 190 200 210 220 230 239 

DGMSFGTAVELSAVMPLSWLPLAADYTRHARRPFAATLTATLAYTLTGCWMYALGLAAAL 
I I I : I I I I I I I I I I I I M I I I I I I I I I I : I I I I I I I I I I II I I I I I I I I I I II M I I I I I 
DGMTFGTAVELSAVMPLSWLPLAADYTRQARRPFAATLTATLAYTLTGCWMYALGLAAAL 
190 200 210 220 230 240 

240 250 260 270 280 290 299 

FTGETDVAKI LLGAGLGAAGILAVVLSTVTTTFLDAYSAGASANNISARFAETPVAVGVT 
I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I i I I I I I I I I 
FTGETDVAKI LLGAGLGITGILAWLSTVTTTFLDTYSAGASANNISARFAE I PVAVGVT 
250 260 270 280 290 300 

300 310 320 330 340 350 359 

LIGTVLAVML PVTEYENFLLLIGSVFAPMAAVL I ADFFVLKRREEIEGFDFAGLVLWLAG 



.rfl25-l.pep 
>rfl25ng-l 



360 370 380 390 400 

FILYRFLLSSGWESSIGLTAPVMSAVAIATVSVRLFFKKTQSLQRNPSX 



Based on this analysis, including the presence of putative leader sequence and transmembrane 
domains in the gonococcal protein, it is predicted that the proteins from TV. meningitidis and 
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N. gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

Example 96 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 809>: 

5 1 ATGACCCGTA TCGCCATCCT CGGCGGCGGC CTCTCGGGAA GGCTGACCGC 

51 GTTGCAGCTT GCAGAACAAG GTTATCAGAT TGCACTTTTC GATAAAAGCT 

101 GCCGCCGGGG CGAACACGCC GCCGCCTATG TAGCCGCCGC CATGCTCGCG 

151 C CT GCAGCGG A. ACGGTCGA AGCCACGCCC GAAGTGGTCA GGCTGGGCAG 

201 GCAGAGCATC CCGCTTTGGC GCGGCATCCG ATGCCGTCTG AACACGCACA 

10 251 CGATGATGCA GGAAAACGGC AGCCTGATTG TATGGCACGG GCAGGACAAG 

301 C CAT TAT CCA GCGAGTTCGT CCGCCATCTC AAACGCGGCG GCGT.ACGGA 

351 TGACGAAATC GTCCGTTGGC GCGCCGACGA CATCGCCGAA CGCGAACCGC 

401 AACTCGGCGG ACGTTTTTAA GACGGCATCT ACCTGCCGAC CGAAGC . CAG 

451 CTCGACGGGC GGCAATTATA GTCTGCACTT GCCGACGCTT TGGACGAACT 

15 501 GAACGTCCCC TGCCATTGGG AACACGAATG CGTCCCCGAA GCCTGCAAG. . 

This corresponds to the amino acid sequence <SEQ ID 810; ORF126>: 

1 MTRIAILGGG LSGRLTALQL AEQGYQIALF DKSCRRGEHA AAYVAAAMLA 

51 PAAXTVEATP EWRLGRQSI PLWRGIRCRL NTHTMMQENG SLIVWHGQDK 

101 PLSSEFVRHL KRGGXTDDEI VRWRADDIAE REPQLGGRFX DGIYLPTEXQ 

20 151 LDGRQLXSAL ADALDELNVP CHWEHECVPE ACK... 

Further work revealed the complete nucleotide sequence <SEQ ID 811>: 

1 ATGACCCGTA TCGCCATCCT CGGCGGCGGC CTCTCGGGAA GGCTGACCGC 

51 GTTGCAGCTT GCAGAACAAG GTTATCAGAT TGCACTTTTC GATAAAGGCT 

101 GCCGCCGGGG CGAACACGCC GCCGCCTATG TTGCCGCCGC CATGCTCGCG 

25 151 CCTGCGGCGG AAGCGGTCGA AGCCACGCCC GAAGTGGTCA GGCTGGGCAG 

201 GCAGAGCATC CCGCTTTGGC GCGGCATCCG ATGCCGTCTG AACACGCACA 

251 CGATGATGCA GGAAAACGGC AGCCTGATTG TGTGGCACGG GCAGGACAAG 

301 C CAT TAT CCA GCGAGTTCGT CCGCCATCTC AAACGCGGCG GCGTAGCGGA 

351 TGACGAAATC GTCCGTTGGC GCGCCGACGA CATCGCCGAA CGCGAACCGC 

30 4 01 AACTCGGCGG ACGTTTTTCA GACGGCATCT ACCTGCCGAC CGAAGGCCAG 

451 CTCGACGGGC GGCAAATATT GTCTGCACTT GCCGACGCTT TGGACGAACT 

5 01 GAACGTCCCC TGCCATTGGG AACACGAATG CGTCCCCGAA GGCCTGCAAG 

551 CCCAATACGA CTGGCTGATC GACTGCCGCG GCTACGGCGC AAAAACCGCG 

601 TGGAACCAAT CCCCCGAGCA CACCAGCACC CTGCGCGGCA TACGCGGCGA 

35 651 AGTGGCGCGG GTTTACACAC CCGAAAT CAC GCTCAACCGC CCCGTGCGTC 

7 01 TGCTCCATCC GCGTTATCCG CTCTACATCG CCCCGAAAGA AAACCACGTC 

7 51 TTCGTCATCG GCGCGACCCA AATCGAAAGC GAAAGCCAAG CCCCCGCCAG 

8 01 CGTGCGTTCA GGGTTGGAAC TCTTGTCCGC ACTCTATGCC ATCCACCCCG 
851 CCTTCGGCGA AGCCGACATC CTCGAAATCG CCACCGGCCT GCGCCCCACG 

40 901 CTCAACCACC ACAACCCCGA AATCCGTTAC AACCGCGCCC GACGCCTGAT 

951 TGAAATCAAC GGCCTTTTCC GCCACGGTTT CATGATCTCC CCCGCCGTAA 

1001 CCGCCGCCGC CGCCAGATTG GCAGTGGCAC TGTTTGACGG AAAAGACGCG 

1051 CCCGAACGCG ATAAAGAAAG CGGTTTGGCG TATATCCGAA GACAAGATTA 

1101 A 

45 This corresponds to the amino acid sequence <SEQ ID 812; ORF126-l>: 

1 MTRIAILGGG LSGRLTALQL AEQGYQIALF DKGCRRGEHA AAYVAAAMLA 

51 P AAE AVE AT P EWRLGRQSI PLWRGIRCRL NTHTMMQENG SLIVWHGQDK 

101 PLSSEFVRHL KRGGVADDE I VRWRADDIAE REPQLGGRFS DGIYLPTEGQ 

151 LDGRQILSAL ADALDELNVP CHWEHECVPE GLQAQYDWLI DCRGYGAKTA 

50 2 01 WNQSPEHTST LRGIRGEVAR VYTPEITLNR PVRLLHPRYP LYIAPKENHV 

251 FVIGATQIES ESQAPASVRS GLELLSALYA IHPAFGEADI LEIATGLRPT 

301 LNHHNPEIRY NRARRL I E IN GLFRHGFM IS PAVTAAAARL AVAL F DGKDA 

351 PERDKESGLA YIRRQD* 

Computer analysis of this amino acid sequence gave the following results: 
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Homology with a predicted QRF from N. meningitidis (strain A) 

ORF126 shows 90.0% identity over a 180aa overlap with an ORF (ORF126a) from strain A ofN. 
meningitidis: 

10 20 30 40 50 60 

orfl2 6 pep MTRIAILGGGLSGRLTALQLAEQGYQIALFDKSCRRGEHAAAYVAAAMLAPAAXTVEATP 

I I M I I I I I 1 I I I I I I I I I I I I I I I 1 = 1 HIM! II : I I I I I 

orf 12 6a MTRIAILGGGLSGRLTALQIAEQGYQIALFDKGCRRGEHAAAWAAAMLAPAAEAVEATP 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 12 6 pep EWRLGRQSIPLWRGIRCRLNTHTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGXTDDEI 
I I I I I I I I II II I II 11 : I : I : I I I I I I I I I I I I I I I I I I :! M I I I I II I : I I I 
orf 12 6a EWRLGRQXIPLWRGIRCHLKTPAMMXENGSLIVWHGQDKPLSNEFVRHLKRGGVADDXI 

70 80 90 100 110 120 

130 140 150 160 170 180 

orf 12 6 . pep VRWRADDIAEREPQLGGRFXDGIYLPTEXQLDGRQLXSALADALDELNVPCHWEHECVPE 
I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I : I I I II II I I I I I I I I I I i I I = I I 
orf 126a VRWRADDIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPE 

130 140 150 160 170 180 

The complete length ORF 126a nucleotide sequence <SEQ ID 813> is: 

1 ATGACCCGTA TCGCCATCCT CGGCGGCGGC CTCTCNGGAA GGCTGACCGC 

51 ACTGCAGCTT GCAGAACAAG GTTATCAGAT TGCACTTTTC GATAAAGGCT 

101 GCCGCCGGGG CGAACACGCC GCCGCCTATG TTGCCGCCGC CATGCTCGCG 

151 CCTGCGGCGG AAGCGGTCGA AGCCACGCCT GAAGTGGTCA GGCTGGGCAG 

2 01 G C AG AN CAT C CCGCTTTGGC GCGGCATCCG ATGCCATCTG AAAACGCCTG 
251 CCATGATGCA NGAAAACGGC AGCCTGATTG TGTGGCACGG GCAGGACAAA 

3 01 CCTTTATCCA ACGAGTTCGT CCGCCATCTC AAACGCGGCG GCGTAGCGGA 
351 TGACNAAATC GTCCGTTGGC GCGCCGACGA CATCGCCGAA CGCGAACCGC 

4 01 AACTCGGCGG ACGTTTTTCA GACGGCATCT ACCTGCCGAC CGAAGGCCAG 
4 51 CTCGACGGGC GGCAAATATT GTCTGCACTT GCCGACGCTT TGGACGAACT 
501 GAACGTCCCC TGCCATTGGG AACACGAATG TGCCCCCGAA GACTTGCAAG 
551 CCCAATACGA CTGGCTGATC GACTGCCGCG GCTACGGCGC AAAAACCGCG 
601 TGGAACCAAT CCCCCGANNA NACCAGCACC CTGCGCGGCA TACGCGGCGA 
651 AGTGGCGCGG GTTTACACAC CCGAAATCAC GCTCAACCGC CCCGTGCGCC 
7 01 TGCTACACCC GCGCTATCCG CTNTACATCG CCCCGAAAGA AAACCNCGTC 

7 51 TTCGTCATCG GCGCGACCCA AATCGAAAGC GAAAGCCAAG CACCTGCCAG 

8 01 CGTGCGTTCC GGGCTGGAAC TCTTATCCGC ACTCTATGCC GTCCACCCCG 
851 CCTTCGGCGA AGCCGACATC CTCGAAATCG CCACCGGCCT GCGCCCCACG 
901 CTCAATCACC ACAACCCCGA AATCCGTTAC AACCGCGCCC GACGCCTGAT 
951 TGAAATCAAC GGCCTTTTCC GCCACGGTTT CATGATCTCC CCCGCCGTAA 

1001 CCGCCGCCGC CGTCAGATTG GCAGTGGCAC TGTTTGACGG AAAAGANGCG 
1051 CCCGAACGCG ATGAAGAAAG CGGTTTGGCG TATATCCGAA G AC AAG AT T A 
1101 A 

This encodes a protein having amino acid sequence <SEQ ID 814>: 



1 MTRIAILGGG L5GRLTALQL AEQGYQIALF DKGCRRGEHA AAYVAAAMLA 

51 PAAE AVE AT P EWRLGRQXI PLWRGIRCHL KTPAMMXENG SLIVWHGQDK 

101 PLSNEFVRHL KRGGVADDXI VRWRADDIAE REPQLGGRFS DGIYLPTEGQ 

151 LDGRQILSAL ADALDELNVP CHWEHECAPE DLQAQYDWLI DCRGYGAKTA 

201 WNQSPXXTST LRGIRGEVAR VYT PEITLNR PVRLLHPRYP LY I APKENXV 

251 FVIGATQIES ESQAPASVRS GLELLSALYA VHPAFGEADI LEIATGLRPT 

301 LNHHNPEIRY NRARRLIEIN GLFRHGFM IS PAVTAAAVRL AVALF DGKXA 

351 PERDEESGLA YIRRQD* 

ORF126a and ORF126-1 show 95.4% identity in 366 aa overlap: 



10 20 30 40 50 60 

orf 12 6a . pep MTRIAILGGGLSGRLTALQLAEQGYQIALFDKGCRRGEHAAAYVAAAMLAPAAEAVEATP 
I I I 1 II I I I I I I I I I I I I I I I I II I I I I II I I I I I i II I II I I I I I I I I I I I I I I I I M I 
or f 12 6-1 MTRIAILGGGLSGRLTALQLAEQGYQIALFDKGCRRGEHAAAYVAAAMLAPAAEAVEATP 

10 20 30 40 50 60 
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70 80 90 100 110 120 

orf 12 6a pep EVVRLGRQXIPLWRGIRCHLKTPAMMXENGSLIWHGQDKPLSNEFVRHLKRGGVADDXI 

I I I I I I I I 111111111:1:1 : I I III I I I I I : I I I I 1 I I I I I I 1 I I I 

orf 126-1 EVVRLGRQSIPLWRGIRCRLNTHTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEI 
5 70 80 90 100 110 120 

130 140 150 160 170 180 

orf 12 6a pep VRWRADDIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPE 

I I II I I I I I II I I I I I I I I I II I! I I I I I I M I I I M : II 

10 orf 12 6-1 VRWRADDIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECVPE 

130 140 150 160 170 180 

190 200 210 220 230 240 

orf 12 6a pep DLQAQYDWLIDCRGYGAKTAWNQSPXXTSTLRGIRGEVARVYTPEITLNRPVRLLHPRYP 

15 I I I I I I I M I I I I I I M I I I I I I I I I I I I I I I I I I II I I I I M II I I I I II I 

orf 12 6-1 GLQAQYDWLIDCRGYGAKTAWNQSPEHTSTLRGIRGEVARVYTPEITLNRPVRLLHPRYP 

190 200 210 220 230 240 

250 260 270 280 290 300 

20 orf 12 6a . pep LYIAPKENXVFVIGATQIESESQAPASVRSGLELLSALYAVHPAFGEADILEIATGLRPT 

I I I I M I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I : II M I II I I I I II I I I II I 
orf 126-1 LYIAPKENHVFVIGATQIESESQAPASVRSGLELLSALYAIHPAFGEADILEIATGLRPT 
250 260 270 280 290 300 

25 310 320 330 340 350 360 

orfl26a.pep LNHHN PE I RYNRARRL IEINGL FRHGFMI S PAVTAAAVRLAVALFDGKXAPE RDEE S GLA 
I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I : I I I I I I I I I I 11111:11111 
orf 12 6-1 LNHHNPE I RYNRARRL IEINGLFRHGFMI S PAVTAAAARLAVALFDGKDAPERDKE SGLA 

310 320 330 340 350 360 

30 

orf 12 6a. pep YIRRQDX 
I I I I I I I 

orf 12 6-1 YIRRQDX 

35 

Homology with a predicted ORF from N.gonorrhoeae 

ORF 126 shows 90% identity over a 180 aa overlap with a predicted ORF (ORF126ng) from 
N.gonorrhoeae: 

orf 126 . pep MTRIAILGGGLSGRLTALQLAEQGYQIALFDKSCRRGEHAAAYVAAAMLAPAAXTVEATP 60 
40 I I I I I : I I I I I I I I I I I I I I I I I I I I I MM: I : I I I I M I M M II M M : M II I 

orfl26ng MTRIAVLGGGLSGRLTALQLAEQGYQIELFDKGTRQGEHAAAYVAAAMLAPAAEAVEATP 60 

orf 126. pep EWRLGRQSIPLWRGIRCRLNTHTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGXTDDEI 120 
II : I I I I I I I I I I I I I I I I II I I I I II I I II II I II II I I I I I 1 I 1 II II II I Mill 
45 orfl2 6ng EVIRLGRQSIPLWRGIRCRLNTLTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEI 12 0 

orf 126 . pep VRWRADDIAEREPQLGGRFXDGIYLPTEXQLDGRQLXSALADALDELNVPCHWEHECVPE 18 0 

II I I I I : I M II II I I I I I I I M I I I I I I I I I I : I I I I I I I I I II 1 I M II I I I : I : 
or f 1 2 6ng VRWRADE I AERE PQLGGRFS DG I YLPTEGQLDGRQI L SALADALDELNVPCHWEHECAPQ 180 

50 An ORF126ng nucleotide sequence <SEQ ID 8 1 5> was predicted to encode a protein having amino 
acid sequence <SEQ ID 816>: 

1 MTRIAVLGGG LSGRLTALQL AEQGYQIELF DKGTRQGEHA AAYVAAAMLA 

51 PAAEAVEATP EVIRLGRQSI PLWRGIRCRL NTLTMMQENG SLIVWHGQDK 

101 PLSSEFVRHL KRGGVADDEI VRWRADE IAE REPQLGGRFS DGIYLPTEGQ 

55 151 LDGRQILSAL ADALDELNVP CHWEHECAPQ DLQAQYDWVI DCRGYGAKTA 

2 01 WNQSPEHTST LRG I RGEVRG FTRPKSRSTA PCACCTRAIR STSPRKKTTS 

251 SSSARPKSKA KAKPPPAYVP GWNSYPRSMP STPPSAKPTS SKWRPGLRPT 

301 LNHHNPE IRY SRERRLIEIN GLFRHGFM IS PAVTAAAVRL AVALF DGKDA 

351 PERDEE SGLA YIGRQD* 

60 Further work revealed the following gonococcal DNA sequence <SEQ LD 8 17>: 
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1 ATGACCCGTA TCGCCGTCCT CGGAGGCGGC CTTTCCGGAA GGCTGACCGC 

51 ATTGCAGCTT GCAGAACAAG GT TAT C AG AT TGAACTTTTC GACAAGGGCA 

101 CCCGCCAAGG CGAACACGCC GCCGCCTATG TTGCCGCCGC GATGCTCGCG 

151 CCTGCGGCGG AAGCGGTCGA GGCAACGCCC G AAGT CAT CA GGCTGGGCAG 

201 GCAGAGCATT CCGCTTTGGC GCGGCATCCG ATGCCGTCTG AACACGCTCA 

251 CGATGATGCA GGAAAACGGC AGCCTGATTG TGTGGCACGG GCAGGACAAG 

301 CCATTATCCA GCGAGTTCGT CCGCCATCTC AAACGCGGCG GCGTAGCGGA 

351 TGACGAAATC GTCCGTTGGC GCGCCGATGA AATCGCCGAA CGCGAACCGC 

4 01 AACTCGGCGG ACGTTTTTCA GACGGCATCT ACCTGCCGAC CGAAGGCCAG 

451 CTCGACGGGC GGCAAATATT GTCTGCACTT GCCGACGCTT TGGACGAACT 

501 GAACGTCCCT TGCCATTGGG AACACGAATG CGCCCCCCAA GACCTGCAAG 

551 CCCAATACGA CTGGGTAATC GACTGCCGGG GCTACGGCGC GAAAACCGCG 

601 TGGAACCAAT CCCCCGAGCA CACCAGCACC TTGCGCGGCA TACGCGGCGA 

651 AGTGGCGCGG GTTTACACGC CCGAAATCAC GCTCAACCGC CCCGTGCGCC 

701 TGCTGCACCC GCGCTATCCG CTCTACATCG CCCCGAAAGA AAACCACGTC 

7 51 TTCGTCATCG GCGCGACCCA AATCGAAAGC GAAAGCCAAG CCCCCGCCAG 

801 CGTACGTTCC GGGCTGGAAC TCTTATCCGC GCTCTATGCC GTCCACCCCG 

851 CCTTCGGCGA AGCCGACATC CTCGAAATCG CCGCCGGCCT GCGCCCCACG 

901 CTCAACCACC ACAACCCCGA AATCCGCTAC AGCCGCGAAC GCCGCCTCAT 

951 CGAAATCAAC GGCCTTTTCC GGCACGGCTT TATGATTTCC CCCGCCGTAA 

1001 CCGCCGCCGC CGTCAGATTG GCAGTGGCAC TGTTTGACGG AAAAGACGCG 

1051 CCCGAACGTG AT G AAGAAAG CGGTTTGGCG TATATCGGAA GACAAGATTA 

1101 A 

This corresponds to the amino acid sequence <SEQ ID 818; ORF126ng-l>: 

1 MTRIAVLGGG LSGRLTALQL AEQGYQIELF DKGTRQGEHA AAYVAAAMLA 

51 PAAE AVE AT P EVIRLGRQSI PLWRGIRCRL NTLTMMQENG S L I VWHGQDK 

101 PLSSEFVRHL KRGGVADDEI VRWRADEIAE REPQLGGRFS DGIYLPTEGQ 

151 LDGRQILSAL ADALDELNVP CHWEHECAPQ DLQAQYDWVI DCRGYGAKTA 

2 01 WNQSPEHTST LRGIRGEVAR VYTPEITLNR PVRLLHPRYP LYIAPKENHV 
251 FVIGATQIES ESQAPASVRS GLELLSALYA VHPAFGEADI LEIAAGLRPT 

3 01 LNHHNPEIRY SRERRLIEIN GLFRHGFM IS PAVTAAAVRL AVAL F DGKDA 
351 PERDEESGLA YIGRQD* 

ORF126ng-l and ORF 126-1 show 95.1% identity in 366 aa overlap: 

10 20 30 40 50 60 

or f 12 6-1 . pep MTRIAILGGGLSGRLTALQLAEQGYQIALFDKGCRRGEHAAAYVAAAMLAPAAEAVEATP 
I I I I I : I I I I I I 1 I I I I I I i I I I I I I I I I I I 1 I : I I I I I I I I I I I I I I I I I I I I I I I I 
orfl2 6ng-l MTRIAVLGGGLSGRLTALQLAEQGYQIELFDKGTRQGEHAAAYVAAAMLAPAAEAVEATP 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 126-1. pep EVVRLGRQSIPLWRGIRCRLNTHTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEI 
I I : I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I 
orfl26ng-l EVIRLGRQSIPLWRGIRCRLNTLTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEI 
70 80 90 100 110 120 

130 140 150 160 170 180 

orf 126-1. pep VRWRADDIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECVPE 



190 200 210 220 230 240 

orf 126-1 . pep GLQAQYDWL I DCRGYGAKTAWNQS PEHTSTLRGIRGE VARVYT PE ITLNRPVRLLHPRYP 
I I I I I I I : I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
o r f 1 2 6ng- 1 DLQAQYDWVI DCRGYGAKTAWNQS PEHTSTLRGIRGE VARVYT PE ITLNRPVRLLHPRYP 

190 200 210 220 230 240 

250 260 270 280 290 300 

orf 126-1 . pep LYIAPKENHVFVIGATQIESESQAPASVRSGLELLSALYAIHPAFGEADILEIATGLRPT 
I I I I f I i I I I I I I I I I I I I I I i I I I I I I I I i I I I I I I I I I : I I I I I I I I M 11 I : I I I I I 
orfl2 6ng-l LYIAPKENHVFVIGATQIESESQAPASVRSGLELLSALYAVHPAFGEADILEIAAGLRPT 

250 260 270 280 290 300 

310 320 330 340 350 360 

orf 12 6-1 .pep LNHHNPEIRYNRARRLIEINGLFRHGFMISPAVTAAAARLAVALFDGKDAPERDKESGLA 
I I I I I I M I I : I I I ! I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I : | [ | | I 
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orf 12 6ng-l LNHHNPEIRYSRERRLIEINGLFRHGFMISPAVTAAAVRLAVALFDGKDAPERDEESGLA 
310 320 330 340 350 360 



orf 12 6-1. pep YIRRQDX 
II I I I I 

orf!2 6ng-l YIGRQDX 

Furthermore, ORF126ng-l shows homology to a putative Rhizobium oxidase flavoprotein: 

gi 1 2627327 (AF004408) putative amino acid oxidase flavoprotein [Rhizobium etli] 
Length =327 
Score = 169 bits (423), Expect = 3e-41 

Identities = 112/329 (34%), Positives = 163/329 (49%), Gaps = 25/329 (7%) 



Query: 


3 


RIAVLGGGLSGRLTALQLAEQGYQIELFDKGTRQGEHXXXXXXXXXXXXXXXXXXXXXXX 


62 






RI V G G++G A QL G+++ L ++ G 




Sbjct: 


2 


R I LVNGAGVAGLTVAWQL YRHG FRVT L AERAGT VGA- GASG FAGGMLAPWCERE S AEE PV 


60 


Query: 


63 


IRLGRQSIPLWRGIRCRLNTLTMMQENGSLIVWHGQDKPLSSEFVRHLKRGGVADDEIVR 


122 




+ LGR + W + G+L+V G+D F R G DE+ 




Sbjct: 


61 


LTLGRLAADWWEAA LPGHVHRRGTLWAGGRDTGELDRFSRRTS— GWEWLDEVA— 


113 


Query: 


123 


WRADEIAEREPQLGGRFSDGIYLPTEGQLDGRQILSALADALDELNVPCHWEHECAPQDL 


182 






IA EP L GRF ++ E LD RQ L+ALA L++ + + 




Sbjct: 


114 


IAALEPDLAGRFRRALFFRQEAHLDPRQALAALAAGLEDARMRLTLG WGES 


165 


Query: 


183 


QAQYDWVIDCRGYGAKTAWNQSPEHTSTLRGIRGEVARVYT PEITLNRPVRLLHPRYPLY 


242 






+D V+DC G LRG+RGE+ V T E++L+RPVRLLHPR+ P+Y 




Sbjct: 


166 


DVDHDRWDCTGAA QIGRLPGLRGVRGEMLCVETTEVSLSRPVRLLHPRHPIY 


218 




243 


IAPKENHVFVIGATQIESESQAPASVRSGLELLSALYAVHPAFGEADILE IAAGLRPTLN 


302 






I P++ + F++GAT IES+ P + RS +ELL+A YA+HPAFGEA + E AG+RP 




Sbjct : 


219 


IVPRDKNRFMVGATMIESDDGGPITARSLMELLNAAYAMHPAFGEARVTETGAGVRPAYP 


278 


Query: 


303 


HHNPE IRYSRERRLIE INGLFRHGFMI S P 331 








+ P R ++E R + +NGL+RHGF+++P 




Sbjct: 


279 


DNLP — RVTQEGRTLHVNGLYRHGFLLAP 305 





This analysis suggests that the proteins from N.meningitidis and N. gonorrhoeae, and their epitopes, 
could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 97 

The following DNA sequence, believed to be complete, was identified in N.meningitidis <SEQ ID 
819>: 



1 AT GACTGATA ATCGGGGGTT TACGCTGGTT GAATTAATAT CAGTGGTCTT 

51 GATATTGTCT GTACTTGCTT TAATTGTTTA TCCGAGCTAT CGCAATTATG 

101 TTGAGAAAGC AAAGATAAAT GCAGTGCGGG CAGCCTTGTT AGAAAATGCA 

151 CATTTTATGG AAAAGTTTTA TCTGCAGAAT GGGAGGTTTA AACAAACATC 

201 TACCAAGTGG CCAAGTTTGC CGATTAAAGA GGCAGAAGGC TTTTGTATCC 

251 GTTTGAATGG AATCGtCGCG CGGG..GCTT TAGACAGTAA ATTCATGTTG 

3 01 AAGGCGGTAG CCATAGATAA AGATAAAAAT CCTTTTATTA TTAAGATGAA 
351 TGAAAATCTA GTAACCTTTA §.TTTGCAAGA AGTCCGCCAG TTCGTGTAGT 

4 01 GACGGGCTGG ATTATTTTAA AGGAAATGAT AAGGACTGCA AGTTACTTAA 
4 51 GTAG 

This corresponds to the amino acid sequence <SEQ ID 820; ORF127>: 



1 MTDNRGFTLV ELISWLILS VLALIVYPSY RNYVEKAKIN AVRAALLENA 

51 HFMEKFYLQN GRFKQTSTKW PSLPIKEAEG FCIRLNGIVA RXALDSKFML 

101 KAVAI DKDKN PFIIKMNENL VTFICKKSAS SCSDGLDYFK GNDKDCKLLK 

151 * 
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Further work revealed the following DNA sequence <SEQ ID 821>: 

1 AT GACT GAT A ATCGGGGGTT TACGCTGGTT GAATTAATAT CAGTGGTCTT 

51 GATATTGTCT GTACTTGCTT TAATTGTTTA TCCGAGCTAT CGCAATTATG 

101 TTGAGAAAGC AAAGATAAAT GCAGTGCGGG CAGCCTTGTT AGAAAATGCA 

151 CATTTTATGG AAAAGTTTTA TCTGCAGAAT GGGAGGTTTA AACAAAC AT C 

201 TACCAAGTGG CCAAGTTTGC CGATTAAAGA GGCAGAAGGC TTTTGTATCC 

251 GTTTGAATGG AATCGCGCGC GGGGCTTTAG ACAGTAAATT CATGTTGAAG 

301 GCGGTAGCCA TAGATAAAGA TAAAAATCCT TTTATTATTA AGATGAATGA 

351 AAATCTAGTA ACCTTTATTT GCAAGAAGTC CGCCAGTTCG TGTAGTGACG 

4 01 GGCTGGATTA TTTTAAAGGA AATGATAAGG ACTGCAAGTT ACTTAAGTAG 

This corresponds to the amino acid sequence <SEQ ED 822; ORF127-l>: 

1 MTDNRGFTL V ELISWLILS VLALIV YPSY RNYVEKAKIN AVRAALLENA 
51 HFMEKFYLQN GRFKQTSTKW PSLPIKEAEG FCIRLNGIAR GALDSKFMLK 
101 AVAIDKDKNP FIIKMNENLV TFICKKSASS CSDGLDYFKG NDKDCKLLK* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N. meningitidis (strain A) 

ORF127 shows 98.0% identity over a 150aa overlap with an ORF (ORF127a) from strain A of JV. 
meningitidis: 

10 20 30 40 50 60 

orf 127 . pep MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINAVRAALLENAHFMEKFYLQN 
I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I i I i I I I I : I I I I I I I I I I I I I I I I I I I 
orf 127a MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINTVRAALLENAHFMEKFYLQN 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 127 . pep GRFKQTSTKWPSLPIKEAEGFCIRLNGIVARXALDSKFMLKAVAIDKDKNPFIIKMNENL 
] I I I I M I ! I I I I I I 1 II I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I 
orf 127 a GRFKQTSTKWPSLPIKEAEGFCIRLNGI-ARGALDSKFMLKAVAIDKDKNPFIIKMNENL 

70 80 90 100 110 



130 140 150 

orf 127 .pep VTFICKKSASSCSDGLDYFKGNDKDCKLLKX 
I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I 
orf 127a VTFICKKSASSCSDGLDYFKGNDKDCKLLKX 
120 130 140 150 

The complete length ORF127a nucleotide sequence <SEQ ID 823> is: 



1 ATGACTGATA ATCGGGGGTT TACGCTGGTT GAATTAATAT CAGTGGTCTT 

51 GATATTGTCT GTACTTGCTT TAATTGTTTA TCCGAGCTAT CGCAATTATG 

101 TTGAGAAAGC AAAGATAAAT ACAGTGCGGG CAGCCTTGTT AGAAAATGCA 

151 CATTTTATGG AAAAGTTTTA TCTGCAGAAT GGGAGATTTA AACAAAC AT C 

2 01 TACCAAATGG CCAAGTTTGC CGATTAAAGA GGCAGAAGGC TTTTGTATCC 

251 GTTTGAATGG AATCGCGCGC GGGGCCTTAG ACAGTAAATT CATGTTGAAG 

301 GCGGTAGCCA TAGATAAAGA TAAAAATCCT TTTATTATTA AGATGAATGA 

351 AAATCTAGTA ACCTTTATTT GCAAGAAGTC CGCCAGTTCG TGTAGTGACG 

4 01 GGCTGGATTA TTTTAAAGGA AATGATAAGG ACTGCAAGTT ACTTAAGTAG 

This encodes a protein having amino acid sequence <SEQ ID 824>: 



1 MTDNRGFTL V ELISWLILS VLALIV YPSY RNYVEKAKIN TVRAAL LENA 
51 HFMEKFYLQN GRFKQTSTKW PSLPIKEAEG FCIRLNGIAR GALDSKFMLK 
101 AVAIDKDKNP FIIKMNENLV TFICKKSASS CSDGLDYFKG NDKDCKLLK* 

ORF127a and ORF127-1 show 99.3% identity in 149 aa overlap: 



10 20 30 40 50 60 

orf 127a. pep MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINTVRAALLENAHFMEKFYLQN 
I I I I I I I I I I i I II I I I II I I I I I I I I I I I I I I I I I II I I : II I I I I II I II I I II I I II 
orf 127-1 MTDNRGFTLVELISVVLILSVLALIVYPSYRNYVEKAKINAVRAALLENAHFMEKFYLQN 
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10 20 30 40 50 60 

70 80 90 100 110 120 

orfl27a pep GRFKQTSTKWPSLPIKEAEGFCIRLNGIARGALDSKFMLKAVAIDKDKNPFIIKMNENLV 

I I I I I I I I I I M !! I I I I M I I I I I I I M M M 1 I I I I I I I I I I I I I I 1 

orf 127-1 GRFKQTSTKWPSLPIKEAEGFCIRLNGIARGALDSKFMLKAVAIDKDKNPFIIKMNENLV 
70 80 90 100 110 120 



130 140 150 

orf 127a. pep TFICKKSASSCSDGLDYFKGNDKDCKLLKX 
I I I I I I I I M I I I I I I I I I I I I I I I I I I I I 
orfl27-l TFICKKSAS S CS DGLDYFKGNDKDCKLLKX 

130 140 150 



Homology with a predicted ORF from ^.gonorrhoeae 

ORF127 shows 97.3% identity over a 150 aa overlap with a predicted ORF (ORF127ng) from 
N. gonorrhoeae: 

orf 127 .pep MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINAVRAALLENAHFMEKFYLQN 60 

I I I I I I I I I I I I I I 1 I I I I I I I I I I 1 I I I I I 1 I I I I I I I I I I I I I : I I I I I I I I I I ! I I I 
orfl27ng MTDNRGFTLVELISVVLILSVLALIVYPSYRNYVEKAKINAVRAAFLENAHFMEKFYLQN 60 

orf 127 .pep GRFKQTSTKWPSLPIKEAEGFCIRLNGIVARXALDSKFMLKAVAIDKDKNPFIIKMNENL 120 

I M I I I I I I I I I M I I I I I I I I I I I I I I M I 1 I I I I I I I I I I I I I I I I I M I ! I I I I I 
orf 127ng GRFKQTSTKWPSLPIKEAEGFCIRLNGI-ARGALDSKFMLKAVAIDKDKNPFIIKMNENL 119 

orf 127. pep VTFICKKSASSCSDGLDYFKGNDKDCKLLK 150 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl27ng VTFICKKSASSCSDRLDYFKGNDKDCKLLK 14 9 

The complete length ORF127ng nucleotide sequence <SEQ ID 825> is: 



1 AT GACT GAT A ATCGGGGGTT TACACTGGTT GAATTAATAT CAGTGGTCTT 

51 GATATTGTCT GTACTTGCTT TAATTGTTTA TCCGAGCTAT CGCAATTATG 

101 TTGAGAAAGC AAAGATAAAT GCAGTGCGGG CAGCCTTGTT AGAAAATGCA 

151 CATTTTATGG AAAAGTTTTA TCTGCAGAAT GGGAGATTTA AACAAACATC 

2 01 TACCAAATGG CCAAGTTTGC CGATTAAAGA GGCAGAAGGC TTTTGTATCC 

2 51 GTTTGAATGG AATCGCGCGC GGGGCTTTAG ACAGTAAATT CATGTTGAAG 

301 GCGGTAGCCA TAGATAAAGA TAAAAATCCT TTTATTATTA AGATGAATGA 

351 AAATCTAGTA ACCTTTATTT GCAAGAAGTC CGCCAGTTCG TGTAGTGACG 

4 01 GGCTGGATTA TTTTAAAGGA AATGATAAGG ACTGCAAGTT ACTTAAGTAG 

This encodes a protein having amino acid sequence <SEQ ID 826>: 



1 MTDNRGFTL V ELISVVLILS VLALIV YPSY RNYVEKAKIN AVRAAFLENA 
51 HFMEKFYLQN GRFKQTSTKW PSLPIKEAEG FCIRLNGIAR GALDSKFMLK 
101 AVAIDKDKNP FIIKMNENLV TFICKKSASS CSDRLDYFKG NDKDCKLLK* 

ORF127ng and ORF 127-1 show 100.0% identity in 149 aa overlap: 



10 20 30 40 50 60 

orf 127-1 . pep MTDNRGFTLVELISWLILSVLALIVYPSYRNYVEKAKINAVRAALLENAHFMEKFYLQN 
I I 1 I I I I I I I ! I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I 
orfl27ng-l MTDNRGFTLVELISVVLILSVLALIVYPSYRNYVEKAKINAVRAALLENAHFMEKFYLQN 

10 20 30 40 50 60 



70 80 90 100 110 120 

orf 127-1. pep GRFKQTSTKWPSLPIKEAEGFCIRLNGIARGALDSKFMLKAVAIDKDKNPFIIKMNENLV 
I I 1 I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I 
orfl27ng-l GRFKQTSTKWPSLPIKEAEGFCIRLNGIARGALDSKFMLKAVAIDKDKNPFIIKMNENLV 

70 80 90 100 110 120 



130 140 150 

or f 1 2 7 - 1 . pep TFICKKSASS CS DGLDYFKGNDKDCKLLKX 

orfl27ng-l T FI CKKS AS S C S DGLDYFKGNDKDCKLLKX 

130 140 150 
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This analysis, including the fact that the predicted transmembrane domain is shared by the 
meningococcal and gonococcal proteins, suggests that the proteins from N. meningitidis and 
N.gonorrhoeae, and their epitopes, could be useful antigens for vaccines or diagnostics, or for 
raising antibodies. 

Example 98 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 827> 

1 . . GTGTCGCTGG CTTCGGTGAT TGCCTCTCAA ATCTTCCTTT ACGAAGATTT 

51 CAACCAAATG CGGAAAACC£ GTGGAGCTAT CTGCGGTTTT CTTGTCCAAT 

101 ATTTATCTGG GGTTTCAGCA GGGGTATTTC GATTTGAGTG CCGACGAGAA 

151 CCCCGTACTG CATATCTGGT CTTTGGCAGT AGAGGAACAG TATTACCTCC 

201 TGTATCCCCT TTTGCTGATA TTTTGCTGCA AAAAAACCAA ATCGCTACGG 

251 GTGCTGCGTA ACATCAGCAT CATCCTGTTT TTGATTTTGA CTGCCTCATC 

301 GTTTTTGCCA AGCGGGTTTT ATACCGACAT CCTCAACCAA CCCAATACTT 

351 ATTACCTTTC GACACTGAGG TTTCCCGAGC TGTTGGCAGG TTCGCTGCTG 

401 GCGGTTTACG GGCAAACGCA AAACGGCAGA CGGCAAACAG CAAAT GG AAA 

451 ACGGCAGTTG CTTTCATCAC TCTGCTTCGG CGCATTGCTT GCCTGCCTGT 

501 TCGTGATTGA CAAACACAAT CCGTTTATCC CGGGAATGAC CCTGCTCCTT 

551 CCCTGCCTGC TGACGGCACT GCTTATCCGG AGTATGCAAT ACGGGACACT 

601 TCCGACCCGC ATCCTGTCGG CAAGCCCCAT CGTATTTGTC GGCAAAATCT 

651 CTTATTCCCT ATACCTGTAC CATTGGATTT TTATTGCTTT CGCTCCGCTC 

701 ATTAGAGGCG GGAAACAGCT CGGACTGCCT GCCG. . 

This corresponds to the amino acid sequence <SEQ ID 828; ORF128>: 



1 . . VSiASVIASQ IFLYEDFNQM RKTVELSAVF LSNIYLGFQQ GYFDLSADEN 

51 PVLHIWSLAV EEQYYLLYPL LLIFCCKKTK SLRVLRNISI ILFLILTASS 

101 FLPSGFYTDI LNQPNTYYLS TLRFPELLAG SLLAVYGQTQ NGRRQTANGK 

151 RQLLSSLCFG ALLACLFVID KHNPFIPGMT LLLPCLLTAL LIRSMQYGTL 

201 PTRILSASPI VFVGKISYSL YLYHWIFIAF APLIRGGKQL GLPA. . 

Further work revealed the complete nucleotide sequence <SEQ ID 829>: 



1 ATGCAAGCTG TCCGATACAG 

51 CGTGCTATCC GTCATGATTT 

101 GATTCCTGGG GGTGGACATT 

151 GGCATCATTC TTTCTGAAAT 

201 TTATACCCGC AGGATTAAGC 

251 CGCTGGCTTC GGTGATTGCC 

301 CAAATGCGGA AAACCGTGGA 

351 TCTGGGGTTT CAGCAGGGGT 

4 01 TACTGCATAT CTGGTCTTTG 

451 CCCCTTTTGC TGATATTTTG 

501 GCGTAACATC AGCATCATCC 

551 TGCCAAGCGG GTTTTATACC 

601 CTTTCGACAC TGAGGTTTCC 

651 TTACGGGCAA ACGCAAAACG 

7 01 AGTTGCTTTC ATCACTCTGC 

7 51 ATTGACAAAC ACAATCCGTT 

8 01 CCTGCTGACG GCACTGCTTA 
8 51 CCCGCATCCT GTCGGCAAGC 
901 TCCCTATACC TGTACCATTG 
951 AGGCGACAAA CAGCTCGGAC 

1001 CGGCCGGATT TTCCCTGTTG 

1051 AAACGGAAGA TGACCTTCAA 

1101 GTCCCTGATA CTTGTCGGTT 

1151 AGGAACACCT CCGCCCGTTG 

12 01 TTTCCGGAAA CCGTCCTGAC 
1251 GGGGTTTCTG GATTATGTCG 

13 01 TGTCCCTCGA TTCGGAGTGT 
1351 AACCCGTTAT GT CGAAAAT A 



ACCGGAAATT GACGGATTGC GGGCCGTCGC 
TCCACCTGAA TAACCGCTGG CTGCCCGGAG 
TTCTTTGTCA TCTCAGGATT CCTCATTACC 
ACAGAACGGT TCTTTTTCTT TCCGGGATTT 
GGATTTATCC TGCCTTTATT GCGGCCGTGT 
TCTCAAATCT TCCTTTACGA AGATTTCAAC 
GCTTTCTGCG GTTTTCTTGT CCAATATTTA 
ATTTCGATTT GAGTGCCGAC GAGAACCCCG 
GCAGTAGAGG AACAGTATTA CCTCCTGTAT 
CTGCAAAAAA ACCAAATCGC TACGGGTGCT 
TGTTTTTGAT TTTGACTGCC TCATCGTTTT 
GACATCCTCA ACCAACCCAA TACTTATTAC 
CGAGCTGTTG GCAGGTTCGC TGCTGGCGGT 
GCAGACGGCA AACAGCAAAT GGAAAACGGC 
TTCGGCGCAT TGCTTGCCTG CCTGTTCGTG 
TAT CCCGGGA ATGACCCTGC TCCTTCCCTG 
TCCGGAGTAT GCAATACGGG ACACTTCCGA 
CCCATCGTAT TTGTCGGCAA AATCTCTTAT 
GATTTTTATT GCTTTCGCCC ATTACATTAC 
TGCCTGCCGT ATCGGCGGTT GCCGCGTTGA 
AGTTATTATT TGATTGAACA GCCGCTTAGA 
AAAGGCATTT TTCTGCCTCT ATCTCGCCCC 
ACAACCTGTA CGCAAGGGGG ATATTGAAAC 
CCCGGCGCGC CCCTTGCTGC GGAAAATCAT 
CCTCGGCGAC TCGCACGCCG GACACCTGAG 
GCAGCCGGGA AGGGTGGAAA GCCAAAATCC 
TTGGTTTGGG TAGATGAGAA GCTGGCAGAC 
CCGGGATGAA GTTGAAAAAG CCGAAGCCGT 
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1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 



TTTCATTGCC 
GATTTGAAGC 
GAAACCGTCA 
CAACACATCA 
TTGCCGCAAA 
AAGAGCAATC 
TTGGGTGGAC 
GCCGCTATCT 
TATATGGGGC 
CGGCGGCGCA 



CAATTCTATG 
GCAATCCTTC 
AAAGGATAGC 
ATCAGCCGTT 
CCAATATCTC 
AGGCGGTCTT 
GCACAAAAAT 
TTACGGCGAC 
GGGAATTCCA 
TTGCAGTAG 



ATTTGAGGAT 
CTAATACCCG 
CGCCGTCAAA 
CGCCCCTGAG 
CGCCCCATTC 
TGATTTGATT 
ACCTGCCCAA 
CAAGACCACC 
CAAACACGAA 



GGGCGGCCAG 
GGTTCCCAGC 
CCCGTCTATG 
GGAGGAAAAA 
AGGCTATGGG 
AAAGAT AT T C 
AAACACGGTC 
TGACCTATTT 
CGCCTGCTTA 



CCTGTGCCGA 
CCGATTCAGG 
TTTTTGCAAA 
TTGAAAAGAT 
CGACATCGGC 
CCAATGTGCA 
GAAATATACG 
CGGTTCTTAT 
AATCTTCCCA 



This corresponds to the amino acid sequence <SEQ ID 830; ORF128-l>: 



101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 



MQAVRYRPEI_ 
GIIL SEIQNG 
QMRKTVELSA 
PLLLIFCCKK 
LSTLRFPELL 
IDKHNPF IPG 
SLYLYHWIFI 
KRKMT FKKAF 
FPETVLTLGD 
NPLCRKYRDE 
ETVKRIAAVK 
KSNQAVFDLI 
YMGREFHKHE 



DGLRAVAVLS VMI FHL NNRW LPGGFLG VDI FFVISGFLIT 
AAVSLASVIA SQIFL YEDFN 



SFSFRDFYTR 
VFLSNIYLGF 
TKSLRVLRNI_ 
AGSLLAVYGQ 
MTLLLPCLLT 



RIKRIYPA FI i 
QQGYFDLSAD ] 
SIILFLILTA i 



AFAHYITGDK 
FCLYLAPSLI 
SHAGHLRGFL 
VEKAEAVFIA 
PVYVFANNTS 
KDIPNVHWVD 
RLLKS SHGGA 



TQNGRRQTAN 
ALLI RSMQYG 
QL SLPAVSAV 



GKRQ LLSSLC 



AVEEQYYLLY 
DILNQPNTYY 
FGALLACLFV 



LVGYNLYARG 
DYVGSREGWK 
QFYDLRMGGQ 
ISRSPLREEK 
AQKYLPKNTV 
LQ* 



ILKQEHLRPL 
AKILSLDSEC 
PVPRFEAQSF 
LKRFAANQYL 
EIYGRYLYGD 



PIVFVGKISY 
SYYLIEQPLR 
PGAPLAAENH 
LVWVDEKLAD 
LIPGFPARFR 
RPIQAMGDIG 
QDHLTYFGSY 



25 Computer analysis of this amino acid sequence gave the following results: 

Homology with hypothetical integral membrane protein HI0392 of {{.influenzae (accession number U32723) 
ORF128 and HI0392 show 52% aa identity in 180aa overlap: 

Orfl28: 1 VSLASVIASQIFLYEDFNQMRKTVELSAVFLSNIYLGFQQGYFDLSADENPVLHIWSLAV 60 
++L S IAS IF+Y DFN++RKT+EL+ FLSN YLG QGYFDLSA+ENPVLHIWSLAV 
30 HI0392: 46 MALVSFIASAIFIYNDFNKLRKTIELAIAFLSNFYLGLTQGYFDLSANENPVLHIWSLAV 105 

Orfl28: 61 EEQXXXXXXXXXIFCCKKTKSLRVLRNISIILFLILTASSFLPSGFYTDILNQPNTYYLS 120 

E Q I KK + ++VL I++ILF IL A+SF+ + FY ++L+QPN YYLS 

HI0392: 106 EGQYYLIFPLILILAYKKFREVKVLFIITLILFFILLATSFV SAN FYKEVLHQPNI YYLS 165 

35 

Orfl28: 121 TLRFPELLAGSLLAVYGQTQNGRRQTANGKRQLLSSLCFGALLACLFVIDKHNPFIPGMT 180 

LRFPELL GSLLA+Y N + Q + +L+ L L +CLF+++ + FIPG+T 

HI0392: 166 NLRFPELLVGSLLAIYHNLSN-KVQLSKQVNNILAILSTLLLFSCLFLMNNNIAFIPGIT 224 

40 Homology with a predicted ORF from N. meningitidis (strain A) 

ORF128 shows 98.0% identity over a 244aa overlap with an ORF (ORF128a) from strain A of N. 
meningitidis: 



10 



20 



30 



45 



orfl28.pep 

orfl28a 



VS L AS V I ASQ I FLYE D FNQMRKT VE LS AVF 

ILSEIQNGSFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTVELSAVF 
60 70 80 90 100 110 



50 



60 



70 



90 



LSNIYLGFQQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISI 
I I I I I I I I I I I I I 1 I I I I I I II I I 1 I I I I I I I I I I I I I I I I II ! I i I II II I I I I I I I I I 
LSNIYLGFQQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISI 
120 130 140 150 160 170 



orf 128 .pep 
orfl28a 



60 



CHIR-0160 (356.001) 



-476- 



PATENT 



160 170 180 190 200 210 

orf!28 .pep RQLLSSLCFGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPI 

I I I I I I I I I II I I I I I i I I 1 I I I I M II II I I I I I I I I II I I 1 I I I I I I I I I I I I ! ! I II 

Orfl28a RQLLSSLCFGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPI 
240 250 260 270 280 290 



220 230 240 

orfl28 .pep VFVGKISYSLYLYHWIFIAFAPLIRGGKQLGLPA 
I I I I II II I I 1 1 I I I ! I I I II I I I I I I I II 
orfl28a VFVGKISYSLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKR 
300 310 320 330 340 350 



The complete length ORF128a nucleotide sequence <SEQ ID 83 1> is: 



1 ATGCAAGCTG TCCGATACAG ACCGGAAATT GACGGATTGC GGGCCGTCGC 

51 CGTGCTATCC GTCATGATTT TCCACCTGAA TAACCGCTGG CTGCCCGGAG 

101 GATTCCTGGG GGTGGACATT TTCTTTGTCA TCTCAGGATT CCTCATTACC 

151 GGCATCATTC TTTCTGAAAT ACAGAACGGT TCTTTTTCTT TCCGGGATTT 

201 TTATACCCGC AGGATTAAGC GGATTTATCC TGCTTTTATT GCGGCCGTGT 

251 CGCTGGCTTC GGTGATTGCC TCTCAAATCT TCCTTTACGA AGATTTCAAC 

301 CAAATGCGGA AAACCGTGGA GCTTTCTGCG GTTTTCTTGT CCAATATTTA 

351 TCTGGGGTTT CAGCAGGGGT ATTTCGATTT GAGTGCCGAC GAGAACCCCG 

401 TACTGCATAT CTGGTCTTTG GCAGTAGAGG AACAGTATTA CCTCCTGTAT 

4 51 CCTCTTTTGC TGATATTTTG CTGCAAAAAA ACAAAATCGC TACGGGTGCT 

501 GCGTAACATC AGCATCATCC TATTTCTGAT TTTGACTGCC ACATCGTTTT 

551 TGCCAAGCGG GTTTTATACC GATATTCTCA ACCAACCCAA TACTTATTAC 

601 CTTTCGACAC TGAGGTTTCC CGAGCTGTTG GCAGGTTCGC TGCTGGCGGT 

651 TTACGGGCAA ACGCAAAACG GCAGACGGCA AACAGCAAAT GGAAAACGGC 

7 01 AGTTGCTTTC ATCACTCTGC TTCGGCGCAT TGCTTGCCTG CCTGTTCGTG 

751 ATTGACAAAC ACAATCCGTT TATCCCGGGA ATGACCCTGC TCCTTCCCTG 

801 CCTGCTGACG GCACTGCTTA TCCGGAGTAT GCAATACGGG ACACTTCCGA 

851 CCCGCATCCT GTCGGCAAGC CCCATCGTAT TTGTCGGCAA AATCTCTTAT 

901 TCCCTATACC TGTACCATTG GATTTTTATT GCTTTCGCCC ATTACATTAC 

951 AGGCGACAAA CAGCTCGGAC TGCCTGCCGT ATCGGCGGTT GCCGCGTTGA 

1001 CGGCCGGATT TTCCCTGTTG AGTTATTATT TGATTGAACA GCCGCTTAGA 

1051 AAACGGAAGA TGACCTTCAA AAAGGCATTT TTCTGCCTCT ATCTCGCCCC 

1101 GTCCCTGATA CTTGTCGGTT ACAACCTGTA CGCAAGGGGG ATATTGAAAC 

1151 AGGAACACCT CCGCCCGTTG CCCGGCGCGC CCCTTGCTGC GGAAAATCAT 

1201 TTTCCGGAAA CCGTCCTGAC CCTCGGCGAC TCGCACGCCG GACACCTGCG 

1251 GGGGTTTCTG GATTATGTCG GCAGCCGGGA AGGGTGGAAA GCCAAAATCC 

1301 TGTCCCTCGA TTCGGAGTGT TTGGTTTGGG TAGATGAGAA GCTGGCAGAC 

1351 AACCCGTTAT GTCGAAAATA CCGGGATGAA GTTGAAAAAG CCGAAGCCGT 

1401 TTTCATTGCC CAATTCTATG ATTTGAGGAT GGGCGGCCAG CCCGTGCCGA 

1451 GATTTGAAGC GCAATCCTTC CTAATACCCG GGTTCCCAGC CCGATTCAGG 

1501 GAAACCGTCA AAAGGATAGC CGCCGTCAAA CCCGTCTATG TTTTTGCAAA 

1551 C AAC AC AT C A ATCAGCCGTT CGCCCCTGAG GGAGGAAAAA TTGAAAAGAT 

1601 TTGCCGCAAA CCAATATCTC CGCCCCATTC AGGCTATGGG CGACATCGGC 

1651 AAGAGCAATC AGGCGGTCTT TGATTTGATT AAAGAT AT T C CCAATGTGCA 

1701 TTGGGTGGAC GCACAAAAAT ACCTGCCCAA AAACACGGTC GAAATATACG 

1751 GCCGCTATCT TTACGGCGAC CAAGACCACC TGACCTATTT CGGTTCTTAT 

1801 TATATGGGGC GGGAATTTCA CAAACACGAA CGCCTGCTTA AATCTTCTCG 

1851 CGACGGCGCA TTGCAGTAG 

This encodes a protein having amino acid sequence <SEQ ID 83 2>: 

1 MQAVRYRPE I DGLRAVAVLS VMIFHLN NRW LPGGFLG VDI FFVISGFLIT 

51 GIIL SEIQNG SFSFRDFYTR RIKRIYP AFI AAVSLASVIA SQIFL YSDFN 

101 QMRKTVELSA VFLSNIYLGF QQGYFDLSAD ENPVLHIWSL AVEEQYYLLY 

151 PLLLIFCCKK TKSLRVLRN T SIILFLILTA TSFLPS GFYT DILNQPNTYY 

201 LSTLRFPELL AGSLLAVYGQ TQNGRRQTAN GKRQ LLSSLC FGALLACLFV 

251 IDKHNP FIPG MTLLLPCLLT ALLI RSMQYG TLPTRILSAS PIVFVGKISY 

3 01 SLYLYHWIFI AFAHYITGDK QLG LPAVSAV AALTAGFSLL SYYLIEQPLR 
351 KRKMTFKKAF FCLYLAPSLI LVGYNLYARG ILKQEHLRPL PGAPLAAENH 

4 01 FPETVLTLGD SHAGHLRGFL DYVGSREGWK AKILSLDSEC LVWVDEKLAD 
451 NPLCRKYRDE VEKAEAVFIA QFYDLRMGGQ PVPRFEAQSF LIPGFPARFR 
501 ETVKRTAAVK PVYVFANNTS ISRSPLREEK LKRFAANQYI RPIQAMGDIG 
551 KSNQAVFDLI KDIPNVHWVD AQKYLPKNTV EIYGRYLYGD QDHLTYFGSY 
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601 YMGREFHKHE RLLKSSRDGA LQ* 

ORF128a and ORF128-1 show 99.5% identity in 622 aa overlap: 

orf 128a. pep MQAVRYRPEIDGLRAVAVLSVMIFHLNNRWLPGGFLGVDIFFVISGFLITGIILSEIQNG 

I I I I I I I I I I I I I I I ] ] I I I I I I I I I I 1 I I I I I I I I I II I I i i M I I I I I I I I I I I 
orf 128-1 MQAVRYRPEIDGLRAVAVLSVMIFHLNNRWLPGGFLGVDIFFVISGFLITGIILSEIQNG 

orf 128a. pep SFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTVELSAVFLSNIYLGF 

I I I I I I I I I I I I I! I i I M I I I I M M II I II I I I I I I I I M I 1 1 I I I I I I I I I I I I I II 

orf 128-1 SFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTVELSAVFLSNIYLGF 

orf 128a. pep QQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISIILFLILTA 

II I I I I II I I I II I II M I I I I II I I II I I I I I II I I II II II I II I I I I II I II I M I 
orf 128-1 QQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISIILFLILTA 



40 



orfl28a.pep 

orf 128-1 

orfl28a.pep 

orfl28-l 

orfl28a.pep 

orfl28-l 

orf 128a. pep 

orfl28-l 

orfl28a.pep 

orfl28-l 

orfl28a.pep 

orfl28-l 

orf 128a. pep 

orfl28-l 

orfl28a.pep 

orfl28-l 



TSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGKRQLLSSLC 
= I I I I I I I M I II I i I I I M I I II I I M II II I I I I II II I I II I I I I I I I I II II I II I 
SSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGKRQLLSSLC 

FGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPIVFVGKISY 
I II II I M I I I I I I I I I I II I I II I II II II I 1 I I I I I I I I II I I I I I I I I I I || | | || | 
FGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPIVFVGKISY 

SLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKRKMT FKKAF 
I I I M I I I i M II M I ! I M 1 I M II II I I II II II II II I II M I I II II II II II II I 
SLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKRKMT FKKAF 

FCLYLAPSLILVGYNLYARGILKQEHLRPLPGAPLAAENHFPETVLTLGDSHAGHLRGFL 
I I I I I I I I I I I I I I I I i I I I 1 I M I I I I II I I I I I I I I I I I I I II M I I I I I I II I M I I 
FCLYLAPSLILVGYNLYARGILKQEHLRPLPGAPLAAENHFPETVLTLGDSHAGHLRGFL 

DYVGSREGWKAKILSLDSECLVWVDEKLADNPLCRKYRDEVEKAEAVFIAQFYDLRMGGQ 
I I M I M I I I I I I I II I I I I I I I I I I I II I 1 I I I I I I | || | || | | | | | I | | | | | | | 1 || | 
DYVGSREGWKAKILSLDSECLVWVDEKLADNPLCRKYRDEVEKAEAVFIAQFYDLRMGGQ 

PVPRFEAQS FLI PGFPARFRETVKRIAAVKPVYVFANNTS ISRS PLREEKLKRFAANQYL 

I M I I I I I I I I I I I I I I I i I I II I I I I I I I I II I I I I I i M I I 

PVPRFEAQ S FLI PGFPARFRETVKRIAAVKPVYVFANNT S I SRSPLREEKLKRFAANQYL 

RPIQAMGDIGKSNQAVFDLIKDIPNVHWVDAQKYLPKNTVEIYGRYLYGDQDHLTYFGSY 

M II I II II II I I I II I 11 II II I I I I I I I I I I I I I II || I I I I | I I I I I I I I I 

RPIQAMGDIGKSNQAVFDLIKDIPNVHWVDAQKYLPKNTVEIYGRYLYGDQDHLTYFGSY 



YMGREFHKHERLLKSSRDGALQX 
I I I I I I I I II I I I I I I : I I I I I 
YMGRE FHKHERLLKSSHGGALQX 



Homology with a predicted ORF from N.gonorrhoeae 

ORF128 shows 93.4% identity over 244 aa overlap with a predicted ORF (ORF128ng) from N. 
gonorrhoeae: 



orf 128. pep VSLASVIASQIFLYEDFNQMRKTVELSAVF 30 

I I 1 1 I I I I I I I I I I I I I I 1 I I I I : I ! I : 1 I 

orfl28ng ILSEIQNGSFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTIELSTVF 112 

orf 128 . pep LSNIYLGFQQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISI 90 

I N I M I I : I II I I I I I II I I I I I I I | | | | | | | | | | | | | | | | || | | | | || || || | | | | 

orfl28ng LSNIYLGFRLGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCYKKTKSLRVLRNISI 172 

orf 128 .pep ILFLILTASSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGK 150 

: I i I I I I I : I I I I Ml 

orfl2 8ng ILFLILTASSFLPAGFYTDILNQPNTYYLSTLRFPELLVGSLLAVYGQTQNGRRQTENGK 232 

orfl28 .pep RQLLSSLCFGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPI 210 

INN lll)ll|:||lll]l|:|||||:|]|||lllllllllllllll|||l||||||| 

orfl28ng RQLLSLLCFGALLVCLFVIDKHDPFIPGITLLLPCLLTALLIRSMQYGTLPTRILSASPI 292 
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orfl28 pep VFVGKISYSLYLYHWIFIAFAPLIRGGKQLGLPA 2 

M M M I I I 1 I I I I I I I M II I I I I I I I I I 
orf 12 8ng VFVGKIS YSLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKR 3 

The complete length ORF128ng nucleotide sequence <SEQ ID 833> is: 

1 ATGCAAGCTG TCCGATACAG GCCTGAAATT GACGGATTGC GGGCCGTCGC 

51 CGTGCTATCC GTCATTATTT TCCACCTGAA TAACCGCTGG CTGCCCGGAG 

101 GATTCCTGGG GGTGGACATT TTCTTTGTCA TCTCGGGATT CCTCATTACC 

151 AACATCATTC TTTCTGAAAT ACAGAACGGT TCTTTTTCTT TCCGGGATTT 

201 TTATACCCGC AGGATTAAGC GGATTTATCC TGCTTTTATT GCGGCCGTGT 

251 CCCTGGCTTC GGTGATTGCT TCTCAAATCT TCCTTTACGA AGATTTCAAC 

301 CAAATGAGGA AAACCATAGA GCTTTCTACG GTTTTTTTGT CCAATATTTA 

351 TTTGGGGTTC CGATTGGGGT ATTTCGATTT GAGTGCCGAC GAGAACCCCG 

401 TACTGCATAT CTGGTCTTTG GCGGTAGAGG AACAGTATTA CCTCCTGTAT 

451 CCTCTTTTGC TGATATTCTG TTACAAAAAA ACCAAATCAC TACGGGTGCT 

501 GCGTAATATC AGCATCATCC TGTTTCTGAT TTTGACCGCA TCATCGTTTT 

551 TGCCGGCCGG GTTTTATACC GACATCCTCA ACCAACCcaa TACTTATTAC 

601 CTTTCGACAC TGAGGTTTCC CGAGCTGTTG GTGGGTTCGC TGTTGGCGGT 

651 TTACGGGCAA ACGCAAAACG GCAGACGGCA AACAGAAAAT GGAAAACGGC 

701 AGTTGCTTTC ATTACTCTGT TTCGGCGCat tgCTTGTCTG CCTGTTCGTG 

7 51 ATCGACAAAC ACGATCCGTT TATCCCGGGA ATAACCCTGC TCCTTCCCTG 

8 01 CCTGCTGACG GCGCTGCTTA TCCGGAGTAT GCAATACGGG ACACTTCCGA 
851 CCCGCATCCT GTCGGCAAGC CCCATCGTAT TTGTCGGCAA AATCTCTTAT 
901 TCCCTATACC TGTACCATTG GATTTTTATT GCCTTCGCCC ATT AC ATT AC 
951 AGGCGACAAA CAGCTCGGAC TGCCTGCCGT ATCGGCGGTT GCCGCGTTGA 

10 01 CGGCCGGATT TTCCCTGTTG AGCTATTATT TGATTGAACA GCCGCTTAGA 

1051 AAACGGAAGA TGACCTTCAA AAAGGCATTT TTCTGCCTTT ATCTCGCCCC 

1101 GTCCCTGATG CTTGTCGGTT ACAACCTGTA TTCAAGAGGG ATATTGAAAC 

1151 AGGAACACCT CCGCCCGCTG CCCGGCACGC CCGTTGCTGC GGAAAATAAT 

1201 TTTCCGGAAA CCGTCTTGAC CCTCGGCGAC TCGCACGCCG GACACCTGCG 

1251 GGGGTTTCTG GATTATGTCG GCGGCAGGGA AGGGTGGAAA GCTAAAATCC 

1301 TGTCCCTCGA TTCGGAGTGT TTGGTTTGGG TGGATGAGAA GCTGGCAGAC 

1351 AACCCGTTGT GCCGAAAATA CCGGGATGAA GTTGAAAAAG CCGAAGCTGT 

1401 TTTCATTGCC CAATTCTATG AT T T GAG GAT GGGCGGCCAG CCCGTGCCGA 

1451 GATTTGAAGC GCAATCCTTC CTGATACCCG GGTTCAAAGC CCGATTCAGG 

1501 GAAACCGTCA AGAGGATAGC CGCCGTCAAA CCTGTATATG TTTTTGCAAA 

1551 CAATACATCA ATCAGCCGTT CTCCCTTGAG GGAGGAAAAA TTGAAAAGAT 

1601 TTGCTATAAA CCAATACCTC CGGCCTATTC GGGCTATGGG CGACATCGGC 

1651 AAGAGCAATC AGGCGGTCTT TGATTTGGTT AAAGATATTC CCAATGTGCA 

17 01 TTGGGTGGAC GCACAAAAAT ACCTGCCCAA AAACACGGTC GAAATACACG 

17 51 GACGCT AT CT TTACGGCGAC CAAGACCACC TGACCTATTT CGGTTCTTAT 

1801 TATATGGGGC GGGAATTTCA CAAACACGAA CGCCTGCTCA AGCATTCCCG 

1851 AGGCGGCGCA TTGCAGTAG 

This encodes a protein having amino acid sequence <SEQ ID 834>: 

1 MQAVRYRPE I DGLRAVAVLS VII FHL NNRW LPGGFLG VDI FFVISGFLIT 

51 NIIL SEIQNG SFSFRDFYTR RIKRIYP AFI AAVSLASVIA SQIFL YEDFN 

101 QMRKTIELST VFLSNIYLGF RLGYFDLSAD ENPVLHIWSL AVEEQYYLLY 

151 PLLLIFCYKK TKSLRVLRN I SIILFLILTA SSFLPA GFYT DILNQPNTYY 

201 LSTLRFPELL VGSLLAVYGQ TQNGRRQTEN GKRQ LLSLLC FGALLVCLFV 
251 IDKHDPF IPG ITLLLPCLLT ALLI RSMQYG TLPTRILSAS PIVFVGKISY 
301 SLYLYHWIFI AFAHYITGDK QLG LPAVSAV AALTAGFSLL SYYLIEQPLR 

351 KRKMT FKKAF FCLYLAPSLM LVGYNLYSRG ILKQEHLRPL PGTPVAAENN 
401 FPETVLTLGD SHAGHLRGFL DYVGGREGWK AKILSLDSEC LVWVDEKLAD 

451 NPLCRKYRDE VEKAEAVFIA QFYDLRMGGQ PVPRFEAQSF LIPGFKARFR 

501 ETVKRIAAVK PVYVFANNTS ISRSPLREEK LKRFAINQYL RPIRAMGDIG 

551 KSNQAVFDLV KDIPNVHWVD AQKYLPKNTV E I HGRYLYGD QDHLTYFGSY 

601 YMGREFHKHE RLLKHSRGGA LQ* 

ORF128ng and ORF128-1 show 95.7% identity in 622 aa overlap: 

orf 12 8-1. pep MQAVRYRPEIDGLRAVAVLSVMIFHLNNRWLPGGFLGVDIFFVISGFLITGIILSEIQNG 
I II II I I II I I I I I I I I I I I I : I I I I M I I I I I I II M I I I I M I M I I I : I I II I I I I I 

orf 128ng MQAVRYRPE I DGLRAVAVLSVI I FHLNNRWLPGGFLGVDI FFVISGFLITNI ILSEIQNG 



orf 128-1 . pep SFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTVELSAVFLSNIYLGF 



CHIR-0160 (356.001) 



-479- 



PATENT 



I i I I I I I I I I I I II I M I I I I I I I I I I I I I I I I I I I I I I ! I I I I I : 1 1 I : I I i I I I I I I I 
SFSFRDFYTRRIKRIYPAFIAAVSLASVIASQIFLYEDFNQMRKTIELSTVFLSNIYLGF 

QQGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCCKKTKSLRVLRNISIILFLILTA 
: I I I I I I I I I I I I I I I I I I I I I I M M M I I I I I I 1 I I I I I M I I I II II II II I I M 
RLGYFDLSADENPVLHIWSLAVEEQYYLLYPLLLIFCYKKTKSLRVLRNISIILFLILTA 

SSFLPSGFYTDILNQPNTYYLSTLRFPELLAGSLLAVYGQTQNGRRQTANGKRQLLSSLC 

II II I : I I M I I I I II I M I I I I I I M II 1 : I II II I I 1 I 1 I 1 I I I I I I I I I I I II II 
SSFLPAGFYTDILNQPNTYYLSTLRFPELLVGSLLAVYGQTQNGRRQTENGKRQLLSLLC 

FGALLACLFVIDKHNPFIPGMTLLLPCLLTALLIRSMQYGTLPTRILSASPIVFVGKISY 
I I I I I : I I I I I II I : I I I I I : I II I I I I I I II I I I I I I I I I I II II 1 1 I I I I I I I I I I I I 
FGALLVCLFVIDKHDPFIPGITLLLPCLLTALLIRSMQYGTLPTRILSASPIVFVGKISY 

SLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKRKMT FKKAF 
I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I II I II I I I II I I II I I I I M I 
SLYLYHWIFIAFAHYITGDKQLGLPAVSAVAALTAGFSLLSYYLIEQPLRKRKMT FKKAF 

FCLYLAPSLILVGYNLYARGILKQEHLRPLPGAPLAAENHFPETVLTLGDSHAGHLRGFL 
I I I I I I M I : II I I I I I : I I I M M I I M M I : I : II M : I II 11 I ! I I II M M I II M 
FCLYLAPSLMLVGYNLYSRGILKQEHLRPLPGTPVAAENNFPETVLTLGDSHAGHLRGFL 

DYVGSREGWKAKILSLDSECLVWVDEKLADNPLCRKYRDEVEKAEAVFIAQFYDLRMGGQ 
M II : II I I I I I I I M I I I I I I I I I I I II I I I I I I I II II II II I I I I I I I I I I I I I I I I 
DYVGGREGWKAKILSLDSECLVWVDEKLADNPLCRKYRDEVEKAEAVFIAQFYDLRMGGQ 

PVPRFEAQSFLIPGFPARFRETVKRIAAVKPVYVFANNTSISRSPLREEKLKRFAANQYL 

M I II I I I I I I I I M I I I M I I 1 II I I I I I I I M I II I I I I I I I I I I I I I I I I I I | I I 
PVPRFEAQSFLIPGFKARFRETVKRIAAVKPVYVFANNTSISRSPLREEKLKRFAINQYL 

RPIQAMGDIGKSNQAVFDLIKDIPNVHWVDAQKYLPKNTVEIYGRYLYGDQDHLTYFGSY 

I I I : I I I I I M I I I I I I I I : I I I I I I I I I I I I M I I I I I I I I : I I I I I I I I I I I M I I M 
RPIRAMGDIGKSNQAVFDLVKDIPNVHWVDAQKYLPKNTVEIHGRYLYGDQDHLTYFGSY 

YMGREFHKHERLLKSSHGGALQX 
I I I I M I I II M M I : I II i M 

YMGREFHKHERLLKHSRGGALQX 

610 620 

40 In addition, ORF21 8ng shows homology to a hypothetical H. influenzae protein: 

sp|P43993|Y392_HAEIN HYPOTHETICAL PROTEIN HI0392 >gi | 1074385 | pir | [B64007 
hypothetical protein HI0392 - Haemophilus influenzae (strain Rd KW20) 
>gi 1 1573364 (U32723) H. influenzae predicted coding region HI0392 [Haemophilus 
influenzae] Length = 245 
45 Score = 239 bits (604), Expect = 3e-62 

Identities = 124/225 (55%), Positives = 152/225 (67%), Gaps = 1/225 (0%) 







38 


VDIFFVISGFLITNIILSEIQNGSFSFRDFYTRRIKRIYPXXXXXXXXXXXXXXXXFLYE 


97 


50 






+DIFFVISGFLIT II++EIQ SFS + FYTRRIKRIYP F+Y 




Sbjct: 


1 


MDIFFVISGFLITGIIITEIQQNSFSLKQFYTRRIKRIYPAFITVMALVSFIASAIFIYN 


60 




Query: 


98 


DFNQMRKTIELSTVFLSNIYLGFRLGYFDLSADENPVLHIWSLAVEEQXXXXXXXXXIFC 


157 








DFN++RKTIEL+ FLSN YLG GYFDLSA+ENPVLHIWSLAVE Q I 




55 


Sbjct: 


61 


DFNKLRKTIELAIAFLSNFYLGLTQGYFDLSANENPVLHIWSLAVEGQYYLIFPLILILA 


120 




158 


YKKTKSLRVLRNISIILFLILTASSFLPAGFYTDILNQPNTYYLSTLRFPELLVGSLLAV 


217 








YKK + ++VL I++ILF IL A+SF+ A FY ++L+QPN YYLS LRFPELLVGSLLA+ 






Sbjct: 


121 


YKKFREVKVLFIITLILFFILLATSFVSANFYKEVLHQPNIYYLSNLRFPELLVGSLLAI 


180 


60 


Query: 


218 


YGQTQNGRRQTENGKRQLLSLLCFGALLVCLFVIDKHDPFIPGIT 2 62 










Y N + Q +L++L L CLF+++ + FIPGIT 






Sbjct: 


181 


YHNLSN-KVQLSKQVNNILAILSTLLLFSCLFLMNNNIAFIPGIT 224 





orfl28ng 
^ orfl28-l.pep 
orfl28ng 
orf 128-1. pep 
10 orfl28ng 

orf 128-1 .pep 
orfl28ng 

15 

orfl28-l.pep 
orfl28ng 
20 orf 128-1. pep 

orfl28ng 
orfl28-l.pep 

25 

orfl28ng 
orf 128-1. pep 
30 orfl28ng 

orfl28-l .pep 
orfl28ng 

35 

orf 128-1. pep 
orf 128ng 
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This analysis, including the identification of several putative transmembrane domains, suggests that 
these proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens 
for vaccines or diagnostics, or for raising antibodies. 

Example 99 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 835>: 

1 . . ATTATTTACG AATACCGCTG GATGTTTCTT TACGGCGCAC TGACGACCTT 

51 GGGGCTGACG GTCGTGGCAA C.GCGGGCGG TTCGGTATTG GGTCTGTTGT 

101 TGGCGTTGGC GCGCCTGATT CACTTGGAAA AAGCCGGTGC GCCGATGCGC 

151 GTGCTGGCGT GGGCGTTGCG TAAAGTTTCG CTGCTGTATG TTACGCTGTT 

201 CCGGGGTACG CCGCTGTTTG TGCAGATTGT GATTTGGGCG TATGTGTGGT 

251 TTCCGTTTTT CGTC. . 

This corresponds to the amino acid sequence <SEQ ED 836; ORF129>: 



Further work revealed the complete nucleotide sequence <SEQ ID 837>: 

1 ATGGATTTTC GTTTTGACAT TATTTACGAA TACCGCTGGA TGTTTCTTTA 

51 CGGCGCACTG ACGACCTTGG GGCTGACGGT CGTGGCAACG GCGGGCGGTT 

101 CGGTATTGGG TCTGTTGTTG GCGTTGGCGC GCCTGATTCA CTTGGAAAAA 

151 GCCGGTGCGC CGATGCGCGT GCTGGCGTGG GCGTTGCGTA AAGTTTCGCT 

201 GCTGTATGTT ACGCTGTTCC GGGGTACGCC GCTGTTTGTG CAGATTGTGA 

251 TTTGGGCGTA TGTGTGGTTT CCGTTTTTCG TCCATCCTTC AGACGGCATT 

301 TTGGTCAGCG GCGAGGCGGC AATCGCGCTG CGTCGCGGAT ACGGGCCGCT 

351 GATTGCCGGT TCTTTGGCAC TGATCGCCAA CTCGGGGGCG TATATCTGTG 

4 01 AGATTTTCCG CGCGGGCATC CAGTCTATAG ACAAAGGACA GATGGAGGCG 

451 GCGCGTTCTT TGGGGCTGAC CTATCCGCAG GCGATGCGCT ATGTGATTCT 

501 GCCGCAGGCA TTGCGCCGCA TGCTGCCGCC TTTGGCGAGC GAGTTCATCA 

551 CGCTCTTGAA AGACAGCTCG CTGCTGTCGG TCATTGCTGT GGCGGAGTTG 

601 GCGTATGTTC AGAATACGAT TACGGGCCGG TATTCGGTTT ATGAAGAACC 

651 GCTTTACACC GTCGCCCTGA TTTATCTGTT GATGACGACT TTCTTAGGCT 

7 01 GGATATTCCT GCGTTTGGAA AAACGTTACA ATCCGCAACA CCGCTGA 

This corresponds to the amino acid sequence <SEQ ID 838; ORF129-l>: 

1 MDFRFD J J YE YRWMFLYGAL TTLGLT WAT AGGSVLGLLL ALA RLIHLEK 

51 AGAPMRVLAW ALRKVSLLYV TLFRGTP LFV QIVIWAYVWF PKFV HPSDGI 

101 LVSGEAAIAL RRGYGP LIAG SLALIANSGA YIC EIFRAGI QSIDKGQMEA 

151 ARSLGLTYPQ AMRYVILPQA LRRMLPPLAS E FITLLKDSS LLSVIAVA EL 

201 AYVQNTITGR YSVYEEPLYT VALIYLLMTT FLGWIF LRLE KRYNPQHR* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted QRF from N. meningitidis (strain A) 

ORF129 shows 98.9% identity over a 88aa overlap with an ORF (ORF129a) from strain A ofN. 
meningitidis: 

10 20 30 40 50 

or f 12 9 . pep IIYEYRWMFLYGALTTLGLT WAXAGGSVLGLLLALA RLIHLEKAGAPMRVLAW 
M I I I I I I I I 1 I I I I I I I I I I I I : I ! I I I I | I I I | | | | | | | | | | | | | | | | | | | | 
orf 12 9a MDFRFDIIYEYRWMFLYGALTTLGLT WATAGGSVLGLLLALA RLIHLEKAGAPMRVLAW 



orfl2 9.pep ALRKVSLLYVTLFRGTP LFVQIVIWAYVWFPFFV 
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The complete length ORF129a nucleotide sequence <SEQ ID 839> is: 



101 

151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 



ATGGATTTTC 
CGGCGCACTG 
CGGTATTGGG 
GCCGGTGCGC 
GCTGTATGTT 
TTTGGGCGTA 
TTGGTTAGCG 
GATTGCCGGT 
AGATTTTCCG 
GCGCGTTCTT 
GCCGCAGGCA 
CGCTCTTGAA 
GCGTATGTTC 
GCTTTACACC 
GGATATTCCT 



GTTTTGACAT 
ACGACCTTGG 
TCTGTTGTTG 
CGATGCGCGT 
ACGCTGTTCC 
TGTGTGGTTT 
GCGAGGCGGC 
TCTTTGGCAC 
CGCGGGCATC 
TGGGGCTGAC 
TTGCGCCGTA 
AGACAGCTCG 
AGAATACGAT 
GTCGCCCTGA 
GCGTTTGGAA 



TATTTACGAA 
GGCTGACGGT 
GCGTTGGCGC 
GCTGGCGTGG 
GGGGTACGCC 
CCGTTTTTCG 
AATCGCGCTG 
TGATCGCCAA 
CAGTCTATAG 
CTATCCGCAG 
TGCTGCCGCC 
CTGCTGTCGG 
TACGGGCCGG 
TTTATCTGTT 
AAACGTTACA 



TACCGCTGGA 
CGTGGCGACG 
GCCTGATTCA 
GCGTTGCGTA 
GCTGTTTGTG 
TCCATCCTTC 
CGTCGCGGAT 
CTCGGGGGCG 
ACAAAGGACA 
GCGATGCGCT 
TTTGGCGAGC 
TCATTGCTGT 
TATTCGGTTT 
GATGACGACT 
ATCCGCAACA 



TGTTTCTTTA 
GCGGGCGGTT 
CTTGGAAAAA 
AGGTTTCGCT 
CAGATTGTGA 
AGACGGCATT 
ACGGGCCGCT 
TATATCTGTG 
GAT GGAGGCG 
ATGTGATTCT 
GAGTTCATCA 
GGCGGAGTTG 
ATGAAGAACC 
TTCTTAGGCT 
CCGCTGA 



This encodes a protein having amino acid sequence <SEQ ID 840>: 

1 MDFRFDIIYE YRWMFLYGAL TTLGLT WAT AGGSVLGLLL ALA RLIHLEK 

51 AGAPMRVLAW ALRKVSLLYV TLFRGTP LFV QIVIWAYVWF PFFV HPSDGI 

101 LVSGEAAIAL RRGYGP LIAG SLALIANSGA YIC EIFRAGI QSIDKGQMEA 

151 ARSLGLTYPQ AMRYVILPQA LRRMLPPLAS E FITLLKDSS LLSVIAVA EL 

201 AYVQNTITGR YSVYEEPLYT VALIYLLMTT FLGWIFL RLE KRYNPQHR* 

ORF129a and ORF129-1 show 100.0% identity in 248 aa overlap: 

orf 12 9a . pep MDFRFDII YEYRWMFLYGALTTLGLTWATAGGSVLGLLLALARLIHLEKAGAPMRVLAW 
I I I I I I I I I I M II I I I I i I I I I I M I I I I I I I II I I I I I I I I I I I | I I | | I | | | | | | | | 
orf 12 9-1 MDFRFDIIYEYRWMFLYGALTTLGLTVVATAGGSVLGLLLALARLIHLEKAGAPMRVLAW 

orf 12 9a. pep ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAG 

I I I I I I I I I I I I I I I I Ill I I I I I II I I I I I I || 

orf 12 9-1 ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAG 

or f 12 9a . pep SLALIANSGAYICEIFRAGIQSIDKGQMEAARSLGLTYPQAMRYVILPQALRRMLPPLAS 
N M I I I II II I ! I II I II I I II I I I I I I I I I I I I I | | I | | | | || M I I M I i I I I I I I I 
orfl29-l S LAL IANSGAY I CE I FRAG IQSI DKGQMEAARS LGLT Y PQAMRYVI L PQALRRMLP PLAS 

orf 129a . pep EFITLLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTVALIYLLMTTFLGWIFLRLE 
1 I I I I I I I I I I I I I I I I I I I I I I [ I I I I I I I [ I I I I I I I I M I I I I I I I I I I I I I I I ( I I 
orf 129-1 EFITLLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTVALIYLLMTTFLGWIFLRLE 

orf 12 9a. pep KRYNPQHRX 
1 I I I I I I I I 
orf 12 9-1 KRYNPQHRX 

Homology with a predicted ORF from Kgonorrhoeae 

ORF129 shows 98.9% identity over a 88 aa overlap with a predicted ORF (ORF129ng) from 
Kgonorrhoeae: 



IIYEYRWMFLYGALTTLGLTWAXAGGSVLGLLLALARLIHLEKAGAPMRVLAW 
I I I I I I I I I I I I I M I I I I I I I I : I I 1 1 I I I I i II I I I I I I I I I I I | | | | | 1 M 
MDFRFDIIYE YRWMFLYGALTTLGLTWATAGGSVLGLLLALARLIHLEKAGAPMRVLAW 

ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFV 
I I I I I I I I I I I I I I I I I I I II I I I II I I I I I || I 

ALRKVSLLYVTLFRGTPLFVQIV1WAYWFPFFVILHTAFLGNAMRQSRRVPDKGRWIAG 
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An ORF129ng nucleotide sequence <SEQ ID 841 > was predicted to encode a protein having amino 
acid sequence <SEQ ID 842>: 

1 MDFRFDIIYE YRWMFLYGAL TTLGLT WAT AGGSVLGLLL ALA RLIHLEK 

51 AG APMRV LAW ALRKVSLLYV TLFRGTPL FV QIVIWAYVWF PFFVIL HTAF 

5 101 LGNAMRQSRR VPDKGRWIAG SLELNCQPRG RKTRGEFPPG ESNLGTEPRN 

151 PLSMGQRRFP GCENWYPPQN FIKK* 

Further work revealed the following gonococcal sequence <SEQ ID 843>: 



1 ATGGATTTTc gtTTTGACAT TATTTAcgaA TACCGCTGGA TGTTTCTTTA 

51 CGGCGCACTG Acgaccttgg ggctgacggt cgtggcgacg gCGGGCGGTT 

101 CGGtattggG TCTGTTGTTG GCGTTGGCGC GCCTGATTCA CTTGGAAAAA 

151 GCCGGTGCGC CGATGCGCGT GCTGGCGTGG GCGTTGCGTA AGGTTTCGCT 

201 GCTGTACGTT ACCCTGTTCC GGGGTACGCC GCTGTTTGTG CAGATTGTGA 

251 TTTGGGCGTA TGTGTGGTTT CCGTTTTTCG TCCATCCTTC AGACGGCATT 

301 TTGGTCAGCG GCGAGGCGGC AATCGCGCTG CGTCGCGGAT ACGGGCCGCT 

351 GATTGCCGGT TCTTTGGCAC TGATCGCCAA CTCGGGGGCG TATATCTGTG 

401 AGATTTTCCG CGCGGGCATC CAGTCTATAG ACAAAGGACA GATGGAGGCG 

451 GCGTGTTCTT TGGGACTGAC CTATCCGCAG GCGATGCGCT ATGTGATTCT 

501 GCCGCAGGCA TTGCGCCGTA TGCTGCCGCC TTTGGCGAGC GAGTTCATCA 

551 CGCTCTTGAA AGACAGCTCG CTGCTGTCGG TCATTGCTGT GGCGGAGTTG 

601 GCGTATGTTC AGAATACGAT TACGGGCCGG TATTCGGTTT ATGAAGAACC 

651 GCTTTACACC GCCGCCCTGA TTTATCTGTT GATGACGACT TTCTTAGGCT 

701 GGATATTCCT GCGTTTGGAA AAACGTTACA ATCCGCAACA CCGCTGA 

This corresponds to the amino acid sequence <SEQ ID 844; ORF129ng-l>: 



1 MDFRFDIIYE YRWMFLYGAL TTLGLT WAT AGGSVLGLLL ALA RLIHLEK 

25 51 AGAPMRV LAW ALRKVSLLYV TLFRGTP LFV QIVIWAYVWF PFFV HPSDGI 

101 LVSGEAAIAL RRGYGP LIAG SLALIANSGA YIC EIFRAGI QSIDKGQMEA 

151 ARSLGLTYPQ AMRYVILPQA LRRMLPPLAS E FITLLKDSS LLSVIAVA EL 

201 AYVQNTITGR YSVYEEPLYT VALIYLLMTT FLGWIF LRLE KRYNPQHR* 

ORF129ng-l and ORF129-1 show 99.2% identity in 248 aa overlap: 

30 orf 12 9-1 .pep MDFRFDIIYEYRWMFLYGALTTLGLTWATAGGSVLGLLLALARLIHLEKAGAPMRVLAW 

I I 11 ! I M M I I II M I N I I! ! II I I I I I ! I I I I I 11 ! I I I I I I I I I I M I I I I I ! I I I 

orfl29ng-l MDFRFDIIYEYRWMFLYGALTTLGLTWATAGGSVLGLLLALARLIHLEKAGAPMRVLAW 

orf 12 9-1. pep ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAG 
35 I I I I I I I I I I I I I I I I I I I I I [ I I I I I [ I I I n II I I I I I I I I I I I I M I I I I I I M I I I 

orfl2 9ng-l ALRKVSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAG 

orf 129-1. pep SLALIANSGAYICEIFRAGIQSIDKGQMEAARSLGLTYPQAMRYVILPQALRRMLPPLAS 

„ A 1 I I I I I I I I I I I M I I I I I I I I I I I I Ill 

40 orfl29ng-l SLALIANSGAYICEIFRAGIQSIDKGQMEAACSLGLTYPQAMRYVILPQALRRMLPPLAS 

orf 129-1 . pep EFITLLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTVALIYLLMTTFLGWIFLRLE 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I : I I I I I I I I I I I I I I I I I I | 
orfl2 9ng-l EFITLLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTAALIYLLMTTFLGWIFLRLE 

45 

orf 12 9-1. pep KRYNPQHRX 
I I I I I I I I I 
orfl2 9ng-l KRYNPQHRX 

In addition, ORP129ng-l is homologous to an ABC transporter from A.fulgidus: 

50 2650409 (AE001090) glutamine ABC transporter, permease protein (glnP) 

[Archaeoglobus fulgidus ] Length = 224 
Score = 132 bits (329), Expect = 2e-30 

Identities = 86/178 (48%), Positives = 103/178 (57%), Gaps = 18/178 (10%) 

55 Query: 65 VSLLYVTLFRGTPLFVQIVIWAYVWFPFFVHPSDGILVSGEAAIALRRGYGPLIAGSLAL 124 
+S YV + RGTPL VQI+I +F P+ GI + E A G +AL 

SbjCt: 58 ISTAYVEVIRGTPLLVQILI VYFGLPAIGINLQPEPA GIIAL 99 
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Query: 125 IANSGAYICEIFRAGIQSIDKGQMEAACSLGLTYPQAMRYVILPQALRRMLPPLASEFIT 184 

SGAYI EI RAGI+SI GQMEAA SLG+TY QAMRYVI PQA R +LP L +EFI 
Sbjct: 100 SICSGAYIAEIVRAGIESIPIGQMEAARSLGMTYLQAMRYVIFPQAFRNILPALGNEFIA 159 

Query: 185 LLKDSSLLSVIAVAELAYVQNTITGRYSVYEEPLYTAALIYLLMTTFLGWIFLRLEKR 242 

LLKDSSLLSVI++ EL V I P AL YL+MT L + +K+ 

Sbjct: 160 LLKDSSLLSVISIVELTRVGRQIVNTTFNAWTPFLGVALFYLMMTIPLSRLVAYSQKK 217 

This analysis, including the identification of transmembrane domains in the two proteins, suggests 
that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful 
antigens for vaccines or diagnostics, or for raising antibodies. 



Example 100 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 845>: 

1 . . CTGAAAGAAT GCCGTCTGAA AGACCCTGTT TTTATTCCAA ATATCGTTTA 

51 TAAGAACATC GCCATTACTT TCCTGCTCTT GCACGCCGCC GCCGAACTTT 

101 GGCTGCCCGC GCAAACCGCC GGTTTTACCG CGCTCGCCGT CGGCTTCATC 

151 CTGCTCGCCA AGCTGCGTGA gCTTCACCAT CACGAACTCT TACGTAAACA 

201 cTACGTCCGC ACTTATTACy TGCTCCAACT CTTTGCCGCC GCAGgcJAgT 

251 TTGTGGACAG GCGCGGCGwA ATTACAAAAC CTGCCCGCyT CCGCGCCCCT 

301 GCACCTGATT ACCCTCGGCG GCATGATGGG CGGCGTGATG ATGGTGTGGc 

351 TGACCGCCGG ACTGTGGCAC AGCGGCTTTA CCAAACTCGA CTACCCCAAA 

401 CTCTGCCGCA TTGCCGTCCC CATCCTTTTC GCCGCCGCCG TCTCGCGCGC 

451 TTTCTTGrTG AACGTGAACC CGrTATTTTT CATTACCGTT CCTGCGATTC 

501 TGACCGCCGC CGTATTCGTA CTGTATCTTT TCrCGTTTAT ACCGATATTT 

551 CGGGCGAATG CGTTTACAGA CGATCCGGAr TAr 

This corresponds to the amino acid sequence <SEQ ID 846; ORF130>: 

1 . . iitSCRLKDPV FIPNIVYKNI AITFLLLHAA AELWLPAQTA GFTALAVGFI 

51 LLAKLRELHH HE LLRKHYVR TYYLLQLFAA AGSLWTGAAX LQNLPASAPL 

101 HLITLGGMMG GVMMWLTAG LWHSGFTKLD YPKLCRIAVP ILFAAAVSRA 

151 FLXNVNPXFF ITVPAILTAA VFVLYLFXFI PIFRANAFTD DPE* 

Further work revealed the complete nucleotide sequence <SEQ ID 847>: 

1 ATGCGGCCGT TTTTCGTCGG CGCGGCGGTG CTTGCCATAC TCGGTGCGCT 

51 GGTGTTTTTC ATCAACCCCG GTGCCATCGT CCTGCACCGC CAAATTTTCT 

101 TGGAACTTAT GCTGCCGGCG GCATACGGCG GTTTTTTGAC TGCGGCTTTG 

151 TTGGACTGGA CGGGTTTTTC GGGTAACCTG AAACCTGTCG CGACTTTGAT 

201 GGCGGCATTA TTGCTCGCCG CATCCGCTAT ACTGCCCTTT TCGCCGCAAA 

251 CTGCCTCGTT TTTCGTCGCC GCCTATTGGC TGGTGTTGCT GCTGTTCTGC 

301 GCCCGGCTGA TTTGGCTAGA CCGAAACACC GACAACTTCG CCCTGCTAAT 

351 GTTACTTGCC GCGTTCACTG TTTTTCAGAC GGCATATGCC GTCAGCGGCG 

401 ATTTGAACCT GTTGCGCGCG CAAGTGCATC TAAATATGGC GGCGGTGATG 

4 51 TTCGTATCCG TGCGCGTCAG TATTCTTTTG GGCGCGGAAG CCCTGAAAGA 

501 ATGCCGTCTG AAAGACCCTG TTTTTATTCC AAATATCGTT TATAAAAACA 

551 TCGCCATTAC TTTCCTGCTC TTGCACGCCG CCGCCGAACT TTGGCTGCCC 

601 GCGCAAACCG CCGGTTTTAC CGCGCTCGCC GTCGGCTTCA TCCTGCTCGC 

651 CAAGCTGCGT GAGCTTCACC ATCACGAACT CTTACGTAAA CACTACGTCC 

701 GCACTTATTA CCTGCTCCAA CTCTTTGCCG CCGCAGGCTA TTTGTGGACA 

7 51 GGCGCGGCGA AATTACAAAA CCTGCCCGCC TCCGCGCCCC TGCACCTGAT 

8 01 TACCCTCGGC GGCATGATGG GCGGCGTGAT GATGGTGTGG CTGACCGCCG 
851 GACTGTGGCA CAGCGGCTTT ACCAAACTCG ACTACCCCAA ACTCTGCCGC 
901 ATTGCCGTCC CCATCCTTTT CGCCGCCGCC GTCTCGCGCG CTTTCTTGAT 
951 GAACGTGAAC CCGATATTTT TCATTACCGT TCCTGCGATT CTGACCGCCG 

1001 CCGTATTCGT ACTGTATCTT TTCACGTTTA TACCGATATT TCGGGCGAAT 

1051 GCGTTTACAG ACGATCCGGA ATAA 

This corresponds to the amino acid sequence <SEQ ID 848; ORF130-1>: 

1 MRPFFVGAAV LAILGALVFF INPGAIVLHR QIFLELMLPA AYGGFLTA AL 

51 LDWTGFSGNL KP VATLMAAL LLAASAILP F SPQTASF FVA AYWLVLLLFC 
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101 ARLIWLDRNT DNFA LLMLLA AFTVFQTAYA V SGDLNLLRA QVHLN MAAVM 
151 FVSVRVSILL GA SAX.KECRL KDPVFIPNIV YKN IAITFLL LHAAAELWLP 
201 A QTAGFTALA VGFILLAKL R ELHHHELLRK HYVRTYYLLQ LFAAAGYLWT 
251 GAAKLQNLPA SAPLH LITLG GMMGGVMMVW LTA GLWHSGF TKLDYPKLCR 
5 301 IAVPILFAAA VSRAFLM NVN P IFFITVPAI LTAAVFVL YL FTFIPIFRAN 

351 AFTDDPE* 

Computer analysis of this amino acid sequence gave the following results: 
Homology with a predicted ORF from N. meningitidis (strain A) 

ORF130 shows 94.3% identity over a 193aa overlap with an ORF (ORF130a) from strain A of N. 
10 meningitidis: 

10 20 30 

orf 13 0 .pep LKECRLKDPVFIPNIVYKNIAITFLLLHAA 

I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I 
orf 13 0a LNLLRAQVHLNMAAVMFVSVRVSILLGAEALKECRLKDPVFIPNWYKNIAITFLLLHAA 
15 140 150 160 170 180 190 

40 50 60 70 80 90 

orf 1 3 0 . pep AELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQLFAAAGSLWTGAAX 

M I I I I I I I I I I I : I I I I 1 II I I I I I I [ I I I ! M I I I I M I I M M M M M MINI 

20 orf 1 3 0a AELWLPAQTAGFTSLAVGFILIiAKLRELHHHELLRKHYVRTYYLLQLFAAAGYLWTGAAK 

200 210 220 230 240 250 

100 110 120 130 140 150 

orf 130 .pep LQNLPASAPLHLITLGGMMGGVMMVWLTAGLWHSGFTKLDYPKLCRIAVPILFAAAVSRA 
25 I | | I | I I M || | | | | | | | | | : | | | | | | || | | | | | | | | M ! [ I I II I I I I I I I I I I I || || 

orf 130a LQNLPASAPLHLITLGGMMGSVMMVWLTAGLWHSGFTKLDYPKLCRIAVPILFAAAVSRA 
260 270 280 290 300 310 

160 170 180 190 

30 orf 130. pep FLXNVNPXFFITVPAILTAAVFVLYLFXFIPIFRANAFTDDPEX 

I MM I M II I I I II I I I I I M I :: I : I I I I I I I I I I I I I I 
orf 130a VLMNVNPIFFITVPAILTAAVFVLYLLTFVPIFRANAFTDDPEX 
320 330 340 350 

The complete length ORF130a nucleotide sequence <SEQ H) 849> is: 

35 1 ATGCGGCCGT TTTTCGTCGG CGCGGCGGTG CTTGCCATAC TCGGTGCGCT 

51 GGTGTTTTTC ATCAACCCCG GTGCCATCGT CCTGCACCGC CAAATTTTCT 

101 TGGAACTTAT GCTGCCGGCG GCATACGGCG GTTTTTTGAC TGCGGCTTTG 

151 TTGGACTGGA CGGGTTTTTC GGGTAACCTG AAACCTGTCG CGACTTTGAT 

201 GGCGGCATTA TTGCTCGCCG CATCCGCTAT ACTGCCCTTT TCGCCGCAAA 

40 251 CTGCCTCGTT TTTCGTCGCC GCCTATTGGC TGGTGTTGCT GCTGTTCTGC 

301 GCCCGGCTGA TTTGGCTAGA CCGAAACACC GACAACTTCG CCCTGCTAAT 

351 GTTACTTGCC GCGTTCACTG TTTTTCAGAC GGCATATGCC GTCAGCGGCG 

4 01 ATTTGAACCT GTTGCGCGCG CAAGTGCATC TAAATATGGC GGCGGTGATG 

451 TTCGTATCCG TGCGCGTCAG TATTCTTTTG GGCGCGGAAG CCCTGAAAGA 

45 501 ATGCCGTCTG AAAGACCCAG TATTCATCCC CAATGTCGTC TATAAAAACA 

551 TCGCCATTAC CTTCCTGCTC CTGCACGCCG CCGCCGAACT TTGGCTGCCT 

601 GCGCAAACCG CCGGTTTTAC CTCGCTCGCC GTCGGCTTTA TCCTGCTTGC 

651 CAAGCTGCGT GAGCTTCACC ATCACGAACT CCTGCGCAAA CACTACGTCC 

7 01 GCACTTATTA CCTGCTCCAA CTCTTTGCCG CCGCAGGCTA TTTGTGGACA 

50 751 GGCGCGGCGA AATTACAAAA CCTGCCCGCC TCCGCGCCCC TGCACCTGAT 

801 TACCCTCGGT GGCATGATGG GCAGCGTGAT GATGGTGTGG CTGACTGCCG 

851 GACTGTGGCA CAGCGGCTTT ACCAAGCTCG ACTACCCGAA ACTCTGCCGC 

901 ATCGCCGTCC CCATCCTNTT CGCCGCCGCC GTTTCGCGCG CTGTTTTAAT 

951 GAACGTAAAC CCGATATTCT TCATCACCGT CCCCGCAATT CTGACCGCCG 

55 1001 CCGTGTTCGT GCTTTACCTG CTGACATTCG TACCGATCTT TCGGGCGAAC 

1051 GCGTTTACAG ACGATCCGGA ATAA 

This encodes a protein having amino acid sequence <SEQ ID 850>: 

1 MRPFFVGAAV LAILGALVFF INPGAIVLHR QIFLELMLPA AYGGFLTAA L 
51 LDWTGFSGNL KPVATLMAAL LLAASAILPF SPQTASFFVA AYWLVLLLFC 
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101 ARLIWLDRNT DNF ALLMLLA AFTVFQTAYA V SGDLNLLRA QVHLN MAAVM 

151 FVSVRVSILL GA EALKECRL KDPVFIPNW YKN IAITFLL LHAAAELWLP 

201 AQ TAG FT SLA VGFILLAKL R ELHHHELLRK HYVRTYYLLQ LFAAAGYLWT 

251 GAAKLQNLPA SAPLH LITLG GMMGSVMMVW LTA GLWHSGF TKLDYPKLCR 

301 IAVPILFAAA VSRAVLM NVN P IFFITVPAI LTAAVFVL YL LTFVPIFRAN 

351 AFTDDPE* 

ORF130a and ORF130-1 show 98.3% identity in 357 aa overlap: 

orf 130a . pep MRPFFVGAAVLAILGALVFFINPGAIVLHRQIFLELMLPAAYGGFLTAALLDWTGFSGNL 
I I I I I I I i I I II I I 11 I I I I I I I I I I I I I ! I I I I I I I I I I I I I 1 I I I I I I I I I! I I I M I 
orf 130-1 MRPFFVGAAVLAILGALVFFINPGAIVLHRQIFLELMLPAAYGGFLTAALLDWTGFSGNL 

orf 130a . pep KPVATLMAALLLAASAILPFSPQTASFFVAAYWLVLLLFCARLIWLDRNTDNFALLMLLA 
I I I I I I II I I I I II S I 1 I I I I I I I I I I I I I I II II I II I I I M I I I I I M M II I II I II 
orf 130-1 KPVATLMAALLLAASAILPFSPQTASFFVAAYWLVLLLFCARLIWLDRNTDNFALLMLLA 

orf 130a . pep AFTVFQTAYAVSGDLNLLRAQVHLNMAAVMFVSVRVSILLGAEALKECRLKDPVFIPNVV 
I I I I I I I I I I I I ( I I I I II I I I II II I I M M I II II II II I M II I I 1 I I I M I M I : I 
orf 130-1 AFTVFQTAYAVSGDLNLLRAQVHLNMAAVMFVSVRVSILLGAEALKECRLKDPVFIPNIV 

orf 130a . pep YKNIAITFLLLHAAAELWLPAQTAGFTSLAVGFILLAKLRELHHHELLRKHYVRTYYLLQ 
I I I I I I I II I I I I I II M I I I I I I I I I : I I I I I I I I I I I I I I I I I I II I II I I I I I II I I 
orf 130-1 YKNIAITFLLLHAAAELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQ 



orfl30a.pep 



LFAAAGYLWTGAAKLQNLPASAPLHLITLGGMMGSVMMVWLTAGLWHSGFTKLDYPKLCR 
I I I I I I I I I I I I I I I I I I I I I II I I I I 1 II I I I I : I I I I I I I I I I I I I I I II I I I I I I I I 
LFAAAGYLWTGAAKLQNLPASAPLHLITLGGiyMGGVMMVWLTAGLWHSGFTKLDYPKLCR 



orf 130a. pep IAVPILFAAAVSRAVLMNVNPIFFITVPAILTAAVFVLYLLTFVPIFRANAFTDDPE 
I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I : I I : I I I I I | | I | | | | | 
orf 130-1 IAVPILFAAAVSRAFLMNVNPIFFITVPAILTAAVFVLYLFTFIPIFRANAFTDDPE 

Homology with a predicted ORF from N. gonorrhoeae 

ORF130 shows 91.7% identity over a 193 aa overlap with a predicted ORF (ORF130ng) from 
N. gonorrhoeae: 

orf 130. pep LKECRLKDPVFIPNIVYKNIAITFLLLHAA 30 

I I I I I I I I I I I I I I :: I II I ! j I I I I I I I 
orf 130ng LNLLRAQVHLNMAAVMFVSVRVSVLLGTETLKECRLKDPVFIPNVIYKNIAIT-LLLHAA 2 01 

orf 130 -pep AELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQLFAAAGSLWTGAAX 90 

II I I I I I I I I I I I I I I I I I I ! I II I I I I I I M I | | | M M I I M I I I I | | I I 

orf 130ng AELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQLFAAAGYLWTGAAK 2 61 

orf 130 .pep LQNLPASAPLHLITLGGMMGGVMMVWLTAGLWHSGFTKLDYPKLCRIAVPILFAAAVSRA 150 

I I I I I I I I I I I I I I I I I I I II I I I I I I 1 I I 1 I | || | | | | | | | | | | | || | | || : | | | || 
orfl30ng LQNLPASAPLHLITLGGMTGGVMMVWLTAGLWHSGFTKLDYPKLCRIAVSILFASAVSRA 321 

orf 130. pep FLXNVNPXFFITVPAILTAAVFVLYLFXFIPIFRANAFTDDPE 193 

I I I I I I I I I I I I I I I I M : I I i:: I: I I I I I I I I I I I M 
orfl30ng VLMNVNPIFFITVPEILTAAVFMLYLLTFVPIFRANAFTDDPE 364 

An ORF130ng nucleotide sequence <SEQ ID 851> was predicted to encode a protein having amino 
acid sequence <SEQ ID 852>: 



101 
151 

201 
251 
301 
351 



MNKFFTHPMR PFFVGAA VLA 
RRFFDYRFVG PDGFFRQPET 
LAGVAAVLRL ADLARRQHRT 
H LNMAAVMFV SVRVSVLL GT 
AAELWLPA QT AGFTALAVGF 
AAGYLWTGAA KLQNLPASAP 
DYPKLCR IAV SILFASAVSR 
VP I FRANAFT DDPE* 



ILGALVFFHQ 
CRYFDG GWA 
LRSVDVTAAF 
ETLKECRLKD 
ILLAKL RELH 
LHLITLGGMT 
AVLMNVNPIF 



PRRYHPAPPN FLGTYAAGCI 
CCGCFIAVFT ATC RI FRRRL 
TVFQTAYAVS GDLNLLRAQV 
P VFIPNVIYK NIAITLLL HA 
HHELLRKHYV RTYYLLQLFA 
GGVMMVWLTA GLWHSGFTKL 
FI TVPE ILTA AVFMLYLLTF 
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Further work revealed the following gonococcal DNA sequence <SEQ ID 853>: 

1 ATGCGCCCGT TTTTCGTCGG TGCGGCAGTA CTTGCCATAC TCGGTGCGTT 

51 GGTGTTTTTT ATCAACCCCG GCGCTATCAT CCTGCACCGC CAAATTTTCT 

101 TGGAACTTAT GCTGCCGGCT GCATACGGCG GTTTTTTGAC TACCGCTTTG 

151 TTGGACCGGA CGGGTTTTTC AGGCAACCTG AAACCTGCCG CTACTTTGAT 

201 GGCGGTGTTG TTGCTTGTTG CGGCTGTTTT ATTGCCGTTT TTACCGCAAC 

251 TTGCCGCATT TTTCGTCGCC GCCTATTGGC TGGTGTTGCT GCTGTTCTGC 

301 GCCTGGCTGA TTTGGCTCGA CCGCAACACC GACAACTTCG CTCTGTTGAT 

351 GTTACTTGCC GCATTTACCG TTTTTCAGAC GGCCTATGCC GTCAGCGGCG 

401 ATTTGAACTT ACTGCGCGCG CAAGTGCATT TGAATATGGC GGCGGTCATG 

451 TTCGTATCCG TCCGCGTCAG CGTCCTTTTG GGCACGGAAA CCCTGAAAGA 

501 ATGCCGTCTG AAAGACCCCG TATTCATCCC CAACGTTATC TATAAAAACA 

551 TCGCCATCAC CCTGCTGCTG CACGCCGCCG CCGAACTTTG GCTGCCCGCG 

601 CAAACCGCCG GTTTTACTGC GCTTGCCGTC GGCTTCATCC TGCTCGCCAA 

651 GCTGCGCGAA CTGCACCATC ACGAACTCTT ACGCAAACAC TACGTCCGCA 

701 CTTATTACCT GCTCCAGCTC TTTGCCGCCG CAGGTTATCT GTGGACAGGC 

751 GCGGCGAAAC TGCAAAACCT GCCCGCCTCC GCGCCCCTGC ACCTGATTAC 

801 CCTCGGCGGC ATGACGGGTG GCGTGATGAT GGTGTGGCTG ACTGCCGGAC 

851 TGTGGCACAG CGGCTTTACC AAACTCGACT ACCCGAAACT CTGCCGCATC 

901 GCCGTCTCCA TCCTTTTCGC CTCCGCCGTT TCGCGCGCTG TTTTAATGAA 

951 CGTGAATCCG ATATTCTTCA TCACCGTTCC CGAGATTCTG ACCGCCGCCG 

1001 TGTTCATGCT TTACCTGCTG ACGTTCGTAC CGATTTTTCG AGCGAACGCG 

1051 TTTACAGACG ATCCGGAATA A 

This corresponds to the amino acid sequence <SEQ ID 854; ORF130ng-l>: 

1 MRPF FVGAAV LAILGALVFF I NPGAIILHR QIFLELMLPA AYGGFLTTAL 

51 LDRTGFSGNL KPA ATLMAVL LLVAAVLLPF L PQ LAAFFVA AYWLVLLLFC 

101 AWLIWLDRNT DNF ALLMLLA AFTVFQTAYA V SGDLNLLRA QVH LNMAAVM 

151 FVSVRVSVLL GTETLKECRL KDP VFIPNVI YKNIAITLLL HAAAELWLPA 

201 Q TAGFTALAV GFILLAKL RE LHHHELLRKH YVRTYYLLQL FAAAGYLWTG 

251 AAKLQNLPAS APLHLITLGG MTGGVMMVWL TAGLWHSGFT KLDYPKLCRI 

301 AVSILFASAV SRAVLM NVNP IFFITVPE IL TAAVFMLYLL TFVPI FRANA 

351 FTDDPE* 

ORF130ng-l and ORF130-1 show 92.4% identity in 357 aa overlap: 

orf 130-1 . pep MRPFFVGAAVLAILGALVFFINPGAIVLHRQIFLELMLPAAYGGFLTAALLDWTGFSGNL 

I I M II I II I I I I I I I I I I ! I I I M I : I I I I I I I I I ! I ! I I I I I I I I : I M I I II I I II 
orfl30ng-l MRPFFVGAAVLAILGALVFFINPGAIILHRQIFLELMLPAAYGGFLTTALLDRTGFSGNL 

orf 130-1 . pep KPVATLMAALLLAASAILPFSPQTASFFVAAYWLVLLLFCARLIWLDRNTDNFALLMLLA 
I I : I I I I I : I II : I : : : I i I II I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orfl30ng-l KPAATLMAVLLLVAAVLLPFLPQLAAFFVAAYWLVLLLFCAWLIWLDRNTDNFALLMLLA 

orf 130-1. pep AFTVFQTAYAVSGDLNLLRAQVHLNMAAVMFVSVRVSILLGAEALKECRLKDPVFIPNIV 

I I I I I I I I I I I I I I I II I I I I I ! I I I I I I I I : I I I : I : M M M I I I I I M I : : 

orfl30ng-l AFTVFQTAYAVSGDLNLLRAQVHLNMAAVMFVSVRVSVLLGTETLKECRLKDPVFIPNVI 

orf 13 0-1. pep YKNIAITFLLLHAAAELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQ 

NIIIH I I I I N I II I I I I I I I I ) I I I II II I I I II I I I I I I I I I I I I I I | | | I | M | 

orfl30ng-l YKNIAIT-LLLHAAAELWLPAQTAGFTALAVGFILLAKLRELHHHELLRKHYVRTYYLLQ 

orf 130-1 . pep LFAAAGYLWTGAAKLQNLPASAPLHLITLGGMMGGVMMVWLTAGLWHSGFTKLDYPKLCR 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II M II I 1! M II II I 
orfl30ng-l LFAAAGYLWTGAAKLQNLPASAPLHLITLGGMTGGVMMVWLTAGLWHSGFTKLDYPKLCR 

or f 13 0-1 . pep IAVPILFAAAVSRAFLMNVNPIFFITVPAILTAAVFVLYLFTFIPIFRANAFTDDPEX 

Ml I I I I : M I I I I II I I I I I M I I I I I : M I : II : I I I I I I I I I I I I I I 

orfl30ng-l IAVS1LFASAVSRAVLMNVNPIFFITVPEILTAAVFMLYLLTFVPIFRANAFTDDPEX 

Based on this analysis, it is predicted that the proteins from N. meningitidis and N. gonorrhoeae, 
their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 
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Example 101 



The following partial DNA sequence was identified mN. meningitidis <SEQ ID 855>: 

1 ATGGAAATTC GGGCAATAAA ATATACGGCA ATGGCTGCGT TGCTTGCATT 

51 TACGGTTGCA GGCTGCCGGC TGGCGGGGTG GTATGAGTGT TCGTCCCTCA 

101 CCGGCTGGTG TAAGCCGAGA AAACCGGCTG CCATCGATTT TTGGGATATT 

151 GGCGGCGAGA GTCCGCCGTC TTTAGGGGAC TACGAGATAC CGCTTTCAGA 

201 CGGCAATAGT TCCGTCAGGG CAAACGAATA TGAATCCGCA CAACAATCTT 

251 ACTTTTACAG GAAAATAGGG AAGTTTGAAG C.TGCGGGCT GGATTGGCGT 

301 ACGCGTGACG GCAAACCTTT GAT T GAGACG TTCAAACAGG GAGGATTTGA 

351 CTGCTTGGAA AAG. . 

This corresponds to the amino acid sequence <SEQ ID 856; ORF131>: 



1 MEIRAIKYTA MAALLAFTVA GCRLAGWYEC SSLTGWCKPR KPAAIDFWDI 
51 GGESPPSLGD YEIPLSDGNS SVRANEYESA QQSYFYRKIG KFEXCGLDWR 
101 TRDGKPLIET FKQGGFDCLE K. . 

Further work revealed the complete nucleotide sequence <SEQ ID 857>: 



1 ATGGAAATTC GGGCAATAAA ATATACGGCA ATGGCTGCGT TGCTTGCATT 

51 TACGGTTGCA GGCTGCCGGC TGGCGGGGTG GTATGAGTGT TCGTCCCTCA 

101 CCGGCTGGTG TAAGCCGAGA AAACCGGCTG CCATCGATTT TTGGGATATT 

151 GGCGGCGAGA GTCCGCCGTC TTTAGGGGAC TACGAGATAC CGCTTTCAGA 

201 CGGCAATCGT TCCGTCAGGG CAAACGAATA TGAATCCGCA CAACAATCTT 

251 ACTTTTACAG GAAAATAGGG AAGTTTGAAG CCTGCGGGCT GGATTGGCGT 

301 ACGCGTGACG GCAAACCTTT GATTGAGACG TTCAAACAGG GAGGATTTGA 

351 CTGCTTGGAA AAGCAGGGGT TGCGGCGCAA CGGTCTGTCC GAGCGCGTCC 

401 GATGGTAA 

This corresponds to the amino acid sequence <SEQ ID 858; ORF131-l>: 



1 MEIRAIKYTA MAALLAFTVA G CRLAGWYEC SSLTGWCKPR KPAAIDFWDI 
51 GGESPPSLGD YEIPLSDGNR SVRANEYESA QQSYFYRKIG KFEACGLDWR 
101 TRDGKPLIET FKQGGFDCLE KQGLRRNGLS ERVRW* 

Computer analysis of this amino acid sequence gave the following results: 



Homology with a predicted ORF from N meningitidis (strain A) 

ORF131 shows 95.0% identity over a 121aa overlap with an ORF (ORF131a) from strain A of TV. 
meningitidis: 



10 20 30 40 50 60 

orf 131 . pep MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLTGWCKPRKPAAIDFWDIGGESPPSLGD 
I N I I I I I I i I I I I I I I I I I I I I I I I I II 1 ] i I : I I I ! I I I I I I I I I I I I I I I I I I I I | 
orf 13 la MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLSGWCKPRKPAAIDFWDIGGESPPSLED 

10 20 30 40 50 60 

70 80 90 100 110 120 

orf 131. pep YEIPLSDGNSSVRANEYESAQQSYFYRKIGKFEXCGLDWRTRDGKPLIETFKQGGFDCLE 
I I I I 1 I I I I I II M I I II I II I I I I I I I I I I I I I I II M M M M M M M I I I I I : 
orf 131a YEIPLSDGNRSVRANEYESAQQSYFYRKIGKFEACGLDWRTRDGKPLIETFKQEGFDCLK 

"70 80 90 100 110 120 



orf 131 .pep 

orfl31a 



The complete length ORF131a nucleotide sequence <SEQ ID 859> is: 



1 ATGGAAATTC GGGCAATAAA ATATACGGCA ATGGCTGCGT TGCTTGCATT 
51 TACGGTTGCA GGCTGCCGGT TGGCAGGTTG GTATGAGTGT TCGTCCCTGT 
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101 CCGGCTGGTG TAAGCCGAGA AAACCTGCCG CCATCGATTT TTGGGATATT 

151 GGCGGCGAGA GTCCTCCGTC TTTAGAGGAC TACGAGATAC CGCTTTCAGA 

201 CGGCAAT CGT TCCGTCAGGG CAAACGAATA TGAATCCGCA CAACAATCTT 

251 ACTTTTACAG GAAAATAGGG AAGTTTGAAG CCTGCGGGTT GGATTGGCGT 

301 ACGCGTGACG GCAAACCTTT GATTGAGACG TTCAAACAGG AAGGTTTTGA 

351 TTGTTTGAAA AAGCAGGGGT TGCGGCGCAA CGGTCTGTCC GAGCGCGTCC 

401 GATGGTAA 

This encodes a protein having amino acid sequence <SEQ ID 860>: 

1 MEIRAIKYTA MAALLAFTVA G CRLAGWYEC SSLSGWCKPR KPAAIDFWDI 
51 GGESPPSLED YEIPLSDGNR SVRANEYESA QQSYFYRKIG KFEACGLDWR 
101 TRDGKPLIET FKQEGFDCLK KQGLRRNGLS ERVRW* 

ORF131a and ORF131-1 show 97.0% identity in 135 aa overlap: 

orf 131a. pep MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLSGWCKPRKPAAIDFWDIGGESPPSLED 
I I I I I I I I I I I I i I I I I I I I I I 1 I I I I I I I I I I : I I I I I I [ [ I I I I I I I I I I I I I I I I I 
orf 131-1 MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLTGWCKPRKPAAIDFWDIGGESPPSLGD 

orf 131a. pep YE I PLSDGNRSVRANEYESAQQSYFYRKIGKFEACGLDWRTRDGKPLIET FKQEGFDCLK 

N I I I I I I I I I I I M ! I I I I I I I I I I I I I I I I I I I I I I i I I I I || I i | I I | | | | | | | | : 
orf 131-1 YEIPLSDGNRSVRANEYE SAQQSYFYRKIGKFEACGLDWRTRDGKPLIETFKQGGFDCLE 

orfl31a.pep KQGLRRNGLSERVRWX 
I I I I I I I I I I I I I I I I 
orf!31-l KQGLRRNGLSERVRWX 

Homology with a predicted ORF from N.gonorrhoeae 

ORF131 shows 89.3% identity over 121 aa overlap with a predicted ORF (ORF131ng) from 
N.gonorrhoeae: 

orf 131 .pep MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLTGWCKPRKPAAIDFWDIGGESPPSLGD 60 

M M : II I I I I I I : I I I I I I II I I I I II I I I : I I I I II I I II I I I I M I I M I II I 
orfl31ng MEIRVIKYTATAALFAFTVAGCRLAGWYECLSLSGWCKPRKPAAIDFWDIGGESPLSLED 60 

orf 131 .pep YEIPLSDGNSSVRANEYESAQQSYFYRKIGKFEXCGLDWRTRDGKPLIETFKQGGFDCLE 12 0 

I I I I I I I I I I M I I I I I I I I : I II I I I I I I I I 1111111111111:1 111 I II I 1 I 
orf 131ng YEIPLSDGNRSVRANEYESAQKSYFYRKIGKFEACGLDWRTRDGKPLVERFKQEGFDCLE 12 0 

orf 131. pep K 121 

orfl31ng KQGLRRNGLSERVRW 134 

A complete length ORF131ng nucleotide sequence <SEQ ID 861> was predicted to encode a 
protein having amino acid sequence <SEQ ID 862>: 

1 MEIRVIKYTA TAALFAFTVA GC RLAGWYEC LSLSGWCKPR KPAAIDFWDI 
51 GGESPLSLED YEIPLSDGNR SVRANEYESA QKSYFYRKIG KFEACGLDWR 
101 TRDGKPLVER FKQEGFDCLE KQGLRRNGLS ERVRW* 

Further work revealed the following gonococcal DNA sequence <SEQ ID 863>: 

1 ATGGAAATTC GGGTAATAAA ATATACGGCA ACGGCTGCGT TGTTTGCATT 

51 TACGGTTGCA GGCTGCCGGC TGGCGGGGTG GTATGAGTGT TCGTCCTTGT 

101 CCGGCTGGTG TAAGCCGAGA AAACCTGCCG CCATCGATTT TTGGGATATT 

151 GGCGGCGAGA GtccgctGTC TTTAGAGGAC TACGAGATAC CGCTTTCAGA 

2 01 CGGCAAT CGT TCCGTCAGGG CAAACGAATA TGAATCCGCG CAAAAATCTT 

251 AC T T T TAT AG GAAAATAGGG AAGTTTGAAG CCTGCGGGTT GGATTGGCGT 

301 ACGCGTGACG GCAAACCTTT GGTTGAGAGG TTCAAACAGG AAGGTTTCGA 

351 CTGTTTGGAA AAGCAGGGGT TGCGGCGCAA CGGCCTGTCC GAGCGCGTCC 

4 01 GATGGTAA 

This corresponds to the amino acid sequence <SEQ ID 864; ORF131ng-l>: 
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1 MEIRVIKYTA TAALFAFTVA G CRLAGWYEC SSLSGWCKPR KPAAIDFWDI 
51 GGESPLSLED YE I PL S DGNR~ S VRANE YE S A QKSYFYRKIG KFEACGLDWR 
101 TRDGKPLVER FKQEGFDCLE KQGLRRNGLS ERVRW* 

ORF131ng-l and ORF131-1 show 92.6% identity in 135 aa overlap: 

nrf1 ncr _i DeB MEIRVIKYTATAALFAFTVAGCRLAGWYECSSLSGWCKPRKPAAIDFWDIGGESPLSLED 

5 P P ||||:||||| I 11:111111 Mi II I 1:11 II II II II II II II II II I I I I 

orf 131-1 MEIRAIKYTAMAALLAFTVAGCRLAGWYECSSLTGWCKPRKPAAIDFWDIGGESPPSLGD 

orfl31ncr-l pep YEIPLSDGNRSVRANEYESAQKSYFYRKIGKFEACGLDWRTRDGKPLVERFKQEGFDCLE 

9 ' P | | | | | | | | | | M II I I I I I I I : I II I M I I I I II I I I I I I I I I I I I I : MM 

orf 131-1 YEIPLSDGNRSVRANEYESAQQSYFYRKIGKFEACGLDWRTRDGKPLIETFKQGGFDCLE 

orf 131ng-l .pep KQGLRRNGLSERVRWX 
I! M II I I I I M I I II 
orfl31-l KQGLRRNGLSERVRWX 

Based on the presence of a predicted prokaryotic membrane lipoprotein lipid attachment site, it is 
predicted that the proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be 
useful antigens for vaccines or diagnostics, or for raising antibodies. 

Example 102 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 865> 

1 AT G AAAC AC A TC CAT AT TAT CGGTATCGGC GGCACGTTTA TGGGCGGGCT 

51 TGCCGCCATT GCCAAAGAAG CGGGGTTTGA AGTCAGCGGT TGCGACGCGA 

101 AGATGTATCC GCCGATGAGC ACCCAGCTCG AAGCCTTGGG TATAGACGTG 

151 TATGAAGGCT TCGATGCCGC TCAGTTGGAC GAATTTAAAG CCGACGTTTA 

2 01 CGTTATCGGC AATGTCGCCA AGCGCGGGAT GGATGTGGTT GAAGCGATTT 

2 51 TGAACCTCGG CCTGCCtTAT ATtTcCGGCC CGCAATGGCT GTCGGAAAAC 

301 GTGCTGCACC ATCATTGGGT ACT CGGTGTG GCGGGGACgC ACGGCAAAAC 

351 GACCACCGCC TCCATGCTCG CATGGGTCTT GGAATATgCC GGCCTCGCGC 

4 01 CGGGCTTCCT TATtGGCGGC GTACC.GGAA AATttCGGCG TTTCCGCCCG 

451 CCTGCCGCAA ACGCCGCGCC AAGACCCGAA CAGCCAATCG CCGTTTTTcG 

501 TCATCGAAGC CGACGAATAC GACACCGCCT TTtTCGACAA ACGTTCTAAA 

551 TtCGTGCATT ACCGTCCGCG TACCGCCGTG TTGAACAATC TGGAATTCGA 

601 CCACGCCGAC ATCTTTGCCG ACTTGGGCGC GATACAGACc CAGTTCCACT 

651 ACCTCGTGCG TACCGTGCCG TCTGAAGGCT TAATCGTCTG CAACGGACGG 

7 01 CAGCAAAGCC TGCAAGATAC TTTGGACAAA GGCTGCTGGA CGCCGGTGGA 

•7 51 AAAATTCGGC ACGGAACACG GCTGGCA. . 

This corresponds to the amino acid sequence <SEQ ID 866; ORF132>: 

1 MKHIHIIGIG GTFMGGLAAI AKEAGFEVSG CDAKMYPPMS TQLEALGIDV 

51 YEGFDAAQLD EFKADVYVIG NVAKRGMDW EAILNLGLPY ISGPQWLSEN 

101 VLHHHWVLGV AGTHGKTTTA SMLAWVLEYA GLAPGFLIGG VXGKFRRFRP 

151 PAANAAPRPE QPIAVFRHRS RRIRHRLFRQ TFXIRALPSA YRRVEQSGIR 

201 PRRHLCRLGR DTDPVPLPRA YRAVXRLNRL QRTAAKPARY FGQRLLDAGG 

251 KIRHGTRLA. . 

Further work revealed the complete nucleotide sequence <SEQ ID 867>: 

1 AT G AAAC AC A TCCATATTAT CGGTATCGGC GGCACGTTTA TGGGCGGGCT 

51 TGCCGCCATT GCCAAAGAAG CGGGGTTTGA AGTCAGCGGT TGCGACGCGA 

101 AGATGTATCC GCCGATGAGC ACCCAGCTCG AAGCCTTGGG TATAGACGTG 

151 TATGAAGGCT TCGATGCCGC TCAGTTGGAC GAATTTAAAG CCGACGTTTA 

201 CGTTATCGGC AATGTCGCCA AGCGCGGGAT GGATGTGGTT GAAGCGATTT 

251 TGAACCTCGG CCTGCCTTAT ATTTCCGGCC CGCAATGGCT GTCGGAAAAC 

301 GTGCTGCACC ATCATTGGGT ACTCGGTGTG GCGGGGACGC ACGGCAAAAC 

351 GACCACCGCC TCCATGCTCG CATGGGTCTT GGAATATGCC GGCCTCGCGC 

4 01 CGGGCTTCCT TATTGGCGGC GTACCGGAAA ATTTCGGCGT TTCCGCCCGC 

451 CTGCCGCAAA CGCCGCGCCA AGACCCGAAC AGCCAATCGC CGTTTTTCGT 

501 CATCGAAGCC GACGAATACG ACACCGCCTT TTTCGACAAA CGTTCTAAAT 
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551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 



TCGTGCATTA 
CACGCCGACA 
CCTCGTGCGT 
AGCAAAGCCT 
AAATTCGGCA 
CTCGTTCGAC 
ATTTGATGGG 
GCGCGTCATG 
GTTTAAAAAC 
TCACCGTTTA 
ATTCAAGGTT 
CGAACCGCGT 
CTGTAAGCCT 
GACTGGGACG 
CGGCAAAGAC 
TAGGCGACCA 
GGAAAGCTGC 



CCGTCCGCGT 
TCTTTGCCGA 
ACCGTGCCGT 
GCAAGATACT 
CGGAACACGG 
GTGTTGCTCG 
CAGGCACAAC 
TCGGTGTCGA 
GTCAAACGCC 
CGACGACTTC 
TGCGCCAACG 
TCCAACACGA 
CAAAGAAGCC 
TCGCCGAAGC 
TTCGATGCCT 
TATTTTGGTG 
TGGAAGCTTT 



ACCGCCGTGT 
CTTGGGCGCG 
CTGAAGGCTT 
TTGGACAAAG 
CTGGCAGGCC 
ACGGCAAAAC 
CGCATGAACG 
TATTCAGACC 
GGATGGAAAT 
GCCCACCACC 
CGTCGGCGGC 
TGAAGCTGGG 
GACCAAGTGT 
CCTCGCGCCT 
TCGTTGCCGA 
ATGAGCAACG 
GAGATAG 



TGAACAATCT 
ATACAGACCC 
AATCGTCTGC 
GCTGCTGGAC 
GGCGAAGCCA 
CGCCGGACGC 
CGCTCGCCGT 
GCCTGCGAAG 
CAAAGGCACG 
CGACCGCCAT 
GCGCGCATCC 
CACGATGAAG 
TCTGCTACGC 
TTGGGCGGCA 
AATCGTGAAA 
GCGGTTTCGG 



GGAATTCGAC 
AGTTCCACTA 
AACGGACGGC 
GCCGGTGGAA 
ATGCCGACGG 
GTCAAATGGG 
CATTGCCGCC 
CCTTGGGCGC 
GCAAACGGCA 
CGAAACCACG 
TCGCCGTCCT 
TCCGCCCTGC 
CGGCGGCGTG 
GGCTGAACGT 
AACGCCGAAG 
CGGAATACAC 



This corresponds to the amino acid sequence <SEQ ID 868; ORF132-1 



MKHIHIIGIG GTFMGGLAAI 



101 
151 
201 
251 
301 
351 
401 
451 



YEGFDAAQLD 
VLHHHWVLGV 
LPQTPRQDPN 
HADIFADLGA 
KFGTEHGWQA 
ARHVGVDIQT 
IQGLRQRVGG 
DWDVAEALAP 
GKLLEALR* 



EFKADVYVIG 
AGTHGKTTTA 
SQSPFFVIEA 
IQTQFHYLVR 
GEANADGSFD 
ACEALGAFKN 
ARILAVLEPR 
LGGRLNVGKD 



AKEAGFEVSG 
NVAKRGMDVV 
SMLAWVLEYA 
DEYDTAFFDK 
TVPSEGLIVC 
VLLDGKTAGR 
VKRRME I KGT 
SNTMKLGTMK 
FDAFVAE1VK 



CDAKMYPPMS 

EAILNLGLPY 
GLAPGFLIGG 
RSKFVHYRPR 
NGRQQSLQDT 
VKWDLMGRHN 
ANGITVYDDF 
SALPVSLKEA 
NAEVGDHILV 



TQLEALGIDV 

ISGPQWLSEN 
VPENFGVSAR 
TAVLNNLEFD 
LDKGCWTPVE 
RMNALAVIAA 
AHHPTAIETT 
DQVFCYAGGV 
MSNGGFGGIH 



Computer analysis of this amino acid sequence gave the following results: 
Homology with the hypothetical o457 protein of E.coli (accession number U14003) 
ORF132 and o457 show 58% aa identity in 140 aa overlap: 

Orfl32 
o457 : 



IHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLDEFK 63 
IHI+GI GTFMGGLA +A++ G EV+G DA +YPPMST LE GI++ +G+DA+QL+ + 
IHILGICGTFMGGLAMLARQLGHEVTGSDANVYPPMSTLLEKQGIELIQGYDASQLEP-Q 61 



Orfl32: 64 ADVYVIGNVAKRGMDVVEAILNLGLPY1SGPQWLSENVLHHHWVLGVAGTHGKTTTASML 123 

D+ +IGN RG VEA+L +PY+SGPQWL + VL WVL VAGTHGKTTTA M 
o457; 62 PDLVIIGNAMTRGNPCVEAVLEKNIPYMSGPQWLHDFVLRDRWVLAVAGTHGKTTTAGMA 121 

Orfl32: 124 AWVLEYAGLAPGFLIGGVXG 143 

W+LE G PGF+IGGV G 
0457: 122 TWILEQCGYKPGFVIGGVPG 141 

Homology with a predicted ORF from N. meningitidis (strain A) 

ORF132 shows 74.6% identity over a 189aa overlap with an ORF (ORF132a) from strain A of TV. 
meningitidis: 



MKHIHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLD 
M M I I I I I I I I I I I I : M I I I I I I I I I I 1 I 1 | | | | | | | | | | | | | | | | | | | | | : | | | | 
MKHIHIIGIGGTFMGGIAAIAKEAGFEXSGCDAKMYPPMSTQLEALGIGVYEGFDTAQLD 



20 



30 



40 



50 



60 



70 



80 90 100 110 120 

EFKADVYVIGNVAKRGMDWEAILNLGLPYISGPQWLSENVLHHHWVLGVAGTHGKTTTA 

m 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 iimimmii inn [in immi 

EFKADVYVIGNVAKRGMDWEAILNRGLPYISGPQWLAENXLHHHWXLGVAXTHGKTTTA 
7 0 80 90 100 110 120 



130 



140 



150 



160 
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Orfl32 pep SMLAWVLEYAGLAPGFLIGGVXGKFR RFRPPAANAAPRPEQPI AVFR 

I | | I I I I I I I I I I li I I I I I : I t : I ■■ I : •• I •• 11 

orfl32a SMLAWVLEYAGLAPGFXIGGVPENFSVSARL-PQTPRQDPNSQSPFFVIEADEYDTAFFD 
130 140 150 160 170 

170 180 190 200 210 220 

orfl32.pep HRSRRIRHRLFRQTFXIRALPSAYRRVEQSGIRPRRHLCRLGRDTDPVPLPRAYRAVXRL 

orfl32a KRSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHHLVRTVPSEGLIVCNGRQQSLQD 
180 190 200 210 220 230 

The complete length ORF132a nucleotide sequence <SEQ ID 869> is: 

1 AT G AAAC AC A TCCACATTAT CGGTATCGGC GGCACGTTTA TGGGTGGGAT 

51 TGCCGCCATT GCCAAAGAAG CAGGGTTTGA ANTCAGCGGT TGCGATGCGA 

101 AGATGTATCC GCCGATGAGC ACCCAGCTCG AAGCCTTGGG CATAGGCGTG 

151 TATGAAGGCT TCGACACCGC GCAGTTGGAC GAATTTAAAG CCGACGTTTA 

201 CGTTATCGGC AATGTCGCCA AGCGCGGGAT GGATGTGGTT GAAGCGATTT 

251 TGAACCGTGG GCTGCCTTAT ATTTCCGGCC CGCAATGGCT GGCTGAAAAC 

301 NTGCTGCACC ATCATTGGNN ACTCGGCGTG GCGGNGACGC ACGGCAAAAC 

351 GACCACCGCG TCTATGCTCG CGTGGGTTTT GGAATATGCC GGACTCGCAC 

401 CGGGCTTCNT TAT CGGCGGC GTACCGGAAA ACTTCAGCGT TTCCGCCCGC 

451 CTGCCGCAAA CGCCGCGCCA AGACCCGAAC AGCCAATCGC CGTTTTTCGT 

501 CATTGAAGCC GACGAATACG ACACCGCGTT TTTCGACAAA CGCTCCAAAT 

551 TCGTGCATTA CCGTCCGCGT ACCGCCGTGT TGAACAATCT GGAATTCGAC 

601 CACGCCGACA TCTTCGCCGA TTTGGGCGCG ATACAGACCC AGTTCCACCA 

651 CCTCGTGCGT ACCGTGCCGT CTGAAGGCCT CATCGTCTGC AACGGACGGC 

7 01 AGCAAAGCCT GCAAGACACT TTGGACAAAG GCTGCTGGAC GCCGGTGGAA 

7 51 AAATTCGGCA CGGAACACGG CTGGCAGGCC GGCGAAGCCA ATGCCGATGG 

801 CTCGTTCGAC GTGTTGCTTG ACGGCAAAAA AGCCGGACAC GTCGCTTGGA 

851 GTTTGATGGG CGGACACAAC CGCATGAACG CGCTCGCNGT CATCGCCGCC 

901 GCGCGTCATG CCGGAGTNGA CATTCAGACG GCCTGCGAAG CCTTGAGCAC 

951 GTTTAAAAAC GTCAAACGCC GCATGGAAAT CAAAGGCACG GCAAACGGTA 

10 01 TCACCGTTTA CGACGACTTC GCCCACCATC CGACCGCTAT CGAAACCACG 

1051 ATTCAAGGTT TGCGCCAGCG CGTCGGCGGC GCGCGCATCC TCGCCGTCCT 

1101 CGAACCGCGT TCCAATACGA TGAAGCTGGG T AC GAT G AAA GCCGCCCTGC 

1151 CCGCAAGCCT CAAAGAAGCC GACCAAGTGT TCTGNTACGC CGGCGGCGCG 

12 01 GACTGGGACG TTGCCGAAGC CCTCGCGCCT TTGGGCGGCA GGCTGCACGT 

1251 CGGCAAAGAC TTCGATGCCT TCGTTGCCGA AATCGTGAAA AACGCCGAAG 

1301 CAGGCGACCA TATTTTGGTG ATGAGCAACG GCGGTTTCGG CGGAATACAC 

1351 ACCAAACTGC TGGACGCTTT GAGATAG 

This encodes a protein having amino acid sequence <SEQ ID 870>: 

1 MKHIHIIGIG GTFMGGIAAI A KEAGFEXSG CDAKMYPPMS TQLEALGIGV 

51 YEGFDTAQLD EFKADVYVIG NVAKRGMDW EAILNRGLPY ISGPQWLAEN 

101 XLHHHWXLGV AXTHGKTTTA SMLAWVLEYA GLAPGFXIGG VPENFSVSAR 

151 LPQT PRQDPN SQSPFFVIEA DEYDTAFFDK RSKFVHYRPR TAVLNNLEFD 

2 01 HADIFADLGA IQTQFHHLVR TVPSEGLIVC NGRQQSLQDT LDKGCWTPVE 
251 KFGTEHGWQA GEANADGSFD VLLDGKKAGH VAWSLMGGHN RMNALAVIAA 

3 01 ARHAGVDIQT ACEALSTFKN VKRRMEIKGT ANGITVYDDF AHHPTAIETT 
351 IQGLRQRVGG ARI LAVLE PR SNTMKLGTMK AALPASLKEA DQVFXYAGGA 

4 01 DWDVAEALAP LGGRLHVGKD FDAFVAE I VK NAEAGDHILV MSNGGFGGIH 
4 51 TKLLDALR* 

ORF 13 2a and ORF 132-1 show 93.9% identity in 458 aa overlap: 

orf 132a. pep MKHIHIIGIGGTFMGGIAAIAKEAGFEXSGCDAKMYPPMSTQLEALGIGVYEGFDTAQLD 
II I I I I I I I I I I 1 M I : I I I I I I I I I I I I II M I I I I I I 1 I I 1 I I II 111111:1111 
orf 132-1 MKHIHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLD 

orf 132a . pep EFKADVYVIGNVAKRGMDWEAILNRGLPYISGPQWLAENXLHHHWXLGVAXTHGKTTTA 
I I I I I I I I I I I I I I I I I I II I I II I I I I II I I I I I I : I I I II I I MM II I I I I I I 
orf 132-1 EFKADVYVIGNVAKRGMDWEAILNLGLPYISGPQWLSENVLHHHWVLGVAGTHGKTTTA 

orf 132a . pep SMLAWVLEYAGLAPGFXIGGVPENFSVSARLPQTPRQDPNSQSPFFVIEADEYDTAFFDK 
I I I I I I I I I I I I I II I II II I M I : II II I I I I I I I I I I I I I M M M M I M II II I I 
orf 132-1 SMLAWVLEYAGLAPGFLIGGVPENFGVSARLPQTPRQDPNSQSPFFVIEADEYDTAFFDK 
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orf 132a . pep RSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHHLVRTVPSEGLIVCNGRQQSLQDT 
I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I ! I I I : I I I I I I I I I I M I I I I I I I I I I I 
orf 132-1 RSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHYLVRTVPSEGLIVCNGRQQSLQDT 

orf 132a . pep LDKGCWTPVEKFGTEHGWQAGEANADGSFDVLLDGKKAGHVAWSLMGGHNRMNALAVIAA 
I I I I I I I I I I I I I II I I II 1 1 I I 1 I I I I I I 1 I I I I I I I : I 1:111 I I I I I II I 1 II I 
orf 132-1 LDKGCWTPVEKFGTEHGWQAGEANADGSFDVLLDGKTAGRVKWDLMGRHNRMNALAVIAA 

orf 132a. pep ARHAGVDIQTACEALSTFKNVKRRMEIKGTANGITVYDDFAHHPTAIETTIQGLRQRVGG 

I M : I II I I II M M : : I I 1 I I I I I I I I I I I I I I I I I I II II I N I ! I I I I I II I I II I I 
orf 132-1 ARHVGVDIQTACEALGAFBCNVKRRMEIKGTANGITVYDDFAHHPTAIETTIQGLRQRVGG 

orf 132a . pep ARILAVLEPRSNTMKLGTMKAALPASLKEADQVFXYAGGADWDVAEALAPLGGRLHVGKD 
I I 1 II I II II I I I I I M II I : I II : II I I I I I I I I I I I : I I I I I i I I M I I I II : I I M 
orf 132-1 ARILAVLEPRSNTMKLGTMKSALPVSLKEADQVFCYAGGVDWDVAEALAPLGGRLNVGKD 

orf 132a . pep FDA F VAE I VKNAEAGDH I LVMSNGGFGG I HT KL L D ALRX 
I I I I I I I I I I 1 I I : I I I I I I I I I II I I II I 111:1111 
orf 132-1 FDAFVAEIVKNAEVGDHILVMSNGGFGGIHGKLLEALRX 

Homology with a predicted ORF from N. gonorrhoeae 

ORF132 shows 89.6% identity over 259 aa overlap with a predicted ORF (ORF132ng) from N. 
gonorrhoeae: 

orf 132 .pep MKHIHIIGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLD 60 

I I I I I I I I M I I II I I : II I I I I I I I : II I I I 1 I I I II I I I I II II II 1:11111111: 
orfl32ng MKHIHIIGIGGTFMGGIAAIAKEAGFKVSGCDAKMYPPMSTQLEALGIGVHEGFDAAQLE 50 

orf 132 . pep E FKADVYV I GN VAKRGMDWEAI LNLGL PY I SGPQWL S ENVLHHHWVLGVAGTHGRTTTA 120 

orfl32ng EFQADIYVIGNVARRGMDWEAILNRGLPYISGPQWLAENVLHHHWVL 120 

orf 132 .pep SMLAWVLEYAGLAPGFLIGGVXGKFRRFRPPAANAAPRPEQPIAVFRHRSRRIRHRLFRQ 18 0 

I I M I I M II I I M M II M I N M 1 I II I : II II Nil I I II II M M II I I M II 
orf 132ng SMLAWVLEYAGLAPGFLIGGVPGKFRRFRPPTANAASRPEQQIAVFRHRSRRIRHRLFRQ 18 0 

orf 132 .pep TFXIRALPSAYRRVEQSGIRPRRHLCRLGRDTDPVPLPRAYRAVXRLNRLQRTAAKPARY 240 

orfl32ng TLQIRALSPAYRRVEQSGIRPRRHLRRLGRDTDPVPPPRAHRTIRRPHRLQRTAAKPARY 2 40 

orf 132. pep FGQRLLDAGGKIRHGTRLA 259 

I I 11 I I I II I I II I I I I I 
orfl32ng FGQRLLDAGGKIRHRTRLADW 261 

An ORF132ng nucleotide sequence <SEQ ID 871 > was predicted to encode a protein having amino 
acid sequence <SEQ ID 872>: 



51 
101 
151 
201 
251 



MKHIHIIGIG GTFMGGIAAI A KEAGFKVSG CDAKMYPPMS TQLEALGIGV 
HEGFDAAQLE EFQADIYVIG NVARRGMDW EAILNRGLPY ISGPQWLAEN 
VLHHHWVLGV AGTHGKTTTA SMLAWVLEYA GLAPGFLIGG VPGKFRRFRP 
PTANAASRPE QQIAVFRHRS RRIRHRLFRQ TLQIRALSPA YRRVEQSGXR 
PRRHLRRLGR DTDPVPPPRA HRTIRRPHRL QRTAAKPARY FGQRLLDAGG 
KIRHRTRLAD W* 



Further work revealed the following gonococcal DNA sequence <SEQ ID 873>: 



101 
151 
201 
251 



AT G AAACACA 
TGCCGCCATT 
AGATGTATCC 
CACGAAGGCT 
CGTCATCGGC 
TGAACCGTGG 
GTGCtgcacc 
gaccaCcGcg 
CGGGCTTCCT 



TCCACATTAT 
GCCAAAGAAG 
GCCGATGAGC 
TCGATGCCGC 
AATGTCGCCA 
GCTGCCTTAT 
atcaTTGGgt 
tCCATGCTCG 
CATCGGCGGt 



CGGTATCGGC 
CCGGGTTCAA 
ACCCAGCTCG 
GCAGTTGGAA 
GGCGCGGGAT 
ATTTCCGGCC 
ACTCGGCGTG 
CCTGGGTCTT 
gtaccggaAA 



GGCACGTTTA 
AGT CAGCGGT 
AAGCCTTGGG 
GAATTTCAAG 
GGATGTGGTC 
CGCAATGGCT 
GcagggaCGC 
GGAATATGCC 
ATTTCGGCGT 



TGGGCGGGAT 
TGCGACGCGA 
CATAGGCGTA 
CCGATATTTA 
GAGGCGATTT 
GGCTGAAAac 
ACGGcaaAac 
GGACTCGCGC 
TTCCGCCCGC 
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451 CTACCGCAAA CGCCGCGTCA AGACCCGAAC AGCAAAT CGC 

501 CATCGAAGCC GACGAATACG ACACCGCCTT TTTCGACAAA 

551 TCGTGCATTA TCGCCCGCGT ACCGCCGTGT TGAACAATCT 

601 CACGCCGACA TCTTCGCCGA CTTGGGCGCG AT ACAGAC C C 

651 CCTCGTGCGC ACCGTACCAT CCGAAGGCCT CATCGTCTGC 

701 AGCAAAGCCT GCAAGATACT TTGGACAAAG GCTGCTGGAC 

7 51 AAATTCGGCA CCGGACACGG CTGGCAGATT GGTGAAGTCA 

801 CTCGTTCGAC GTATTGCTTG ACGGCAAAAA AGCCGGACAC 

851 ATTTGATGGG CGGACACAAC CGCATGAACG CGCTCGCCGT 

901 GCACGCCATG CCGGAGTCGA TGTTCAGACG GCCTGCGAAG 

951 GTTTAAAAAC GTCAAACGCC GCATGGAAAT CAAAGGCACG 

1001 TCACCGTTTA CGACGATTTC GCCCACCACC CGACCGCCAT 

1051 ATTCAAGGTT TGCGCCAACG TGTCGGCGGC GCGCGCATCC 

1101 CGAGCCGCGT TCCAACACCA TGAAACTCGG CACGATGAAG 

1151 CCGCAAGCCT CAAAGAAGCC GACCAAGTGT TCTGCTACGC 

1201 GACTGGGACG TTGCCGAAGC CCTCGCGCCT TTGGGCTGCA 

1251 CGGTAAAGAT TTCGATACCT TCGTTGCCGA AATTGTGAAA 

1301 CCGGCGACCA TATTTTGGTG ATGAGCAACG GCGGTTTCGG 

1351 ACCAAACTGC TGGACGCTTT GAGATAG 

This corresponds to the amino acid sequence <SEQ ID 874; ORF132ng-l>: 

1 MKHIHIIGIG GTFMGGIAAI A KEAGFKVSG CDAKMYPPMS TQLEALGIGV 

51 HEGFDAAQLE EFQADIYVIG NVARRGMDW EAILNRGLPY ISGPQWLAEN 

101 VLHHHWVLGV AGTHGKTTTA SMLAWVLEYA GLAPGFLIGG VPENFGVSAR 

151 LPQTPRQDPN SKSPFFVTEA DEYDTAFFDK RSKFVHYRPR TAVLNNLEFD 

201 HADIFADLGA IQTQFHHLVR TVPSEGLIVC NGQQQSLQDT LDKGCWTPVE 

251 KFGTGHGWQI GEVNADGSFD VLLDGKKAGH VAWDLMGGHN RMNALAVIAA 

301 ARHAGVDVQT ACEALGAFKN VKRRMEIKGT ANGITVYDDF AHHPTAIETT 

351 IQGLRQRVGG ARILAVLEPR SNTMKLGTMK SALPASLKEA DQVFCYAGGA 

401 DWDVAEALAP LGCRLRVGKD FDTFVAEIVK NARTGDHILV MSNGGFGGIH 

451 TKLLDALR* 

ORF132ng-l and ORF132-1 show 93.2% identity in 458 aa overlap: 

orfl32ng-l.pep MKHIHIIGIGGTFMGGIAAIAKEAGFKVSGCDAKMYPPMSTQLEALGIGVHEGFDAAQLE 

111:1111 : I I I I I I Ml || |:||||||||: 

orf 132-1 MKHIH1IGIGGTFMGGLAAIAKEAGFEVSGCDAKMYPPMSTQLEALGIDVYEGFDAAQLD 

orf 132ng-l . pep EFQADIYVIGNVARRGMDWEAILNRGLPYISGPQWLAENVLHHHWVLGVAGTHGKTTTA 

I I = I I : I I I I I I I = I I I I I I I I I II I I I I II I M I I : I I I I I I I I I I I I I I I I I I I I I I 
or f 1 3 2 - 1 EFBCADVYVIGNVAKRGMDWEAILNLGLPYISGPQWLSENVLHHHWVLGVAGTHGKTTTA 

orf 132ng-l . pep SMLAWVLEYAGLAPGFLIGGVPENFGVSARLPQTPRQDPNSKSPFFVIEADEYDTAFFDK 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I : I I I I I I I I I I I I | M I I I 
orf 132-1 SMLAWVLEYAGLAPGFLIGGVPENFGVSARLPQTPRQDPNSQSPFFVIEADEYDTAFFDK 

orf 132ng-l . pep RSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHHLVRTVPSEGLIVCNGQQQSLQDT 
I M I I I I I I I I I I I I I I I I I I I I I I | I I f U | | | | | : | | | | | | | | | | | | | | | : | | | | J | | 
orf 132-1 RSKFVHYRPRTAVLNNLEFDHADIFADLGAIQTQFHYLVRTVPSEGLIVCNGRQQSLQDT 

orf 132ng-l . pep LDKGCWTPVEKFGTGHGWQIGEVNADGSFDVLLDGKfCAGHVAWDLMGGHNRMNALAVIAA 
I I I I I 1 M I 1 I I I I I I I I I I : I I I I I I I M M I I I I : I I I I I I I I M I I I I I I I I 
or f 1 3 2 - 1 LDKGCWTPVEKFGTEHGWQAGEANADGSFDVLLDGKTAGRVKWDLMGRHNRMNALAVIAA 

orf 132ng-l . pep ARHAGVDVQTACE ALGAFKNVKRRME IKGTANGI TVYDD FAHH PT AI E TT I QGLRQRVGG 
I M : I I I : I I I M I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I | || 
o rf 1 3 2 - 1 ARHVGVDIQTACEALGAFKNVKRRMEIKGTANGI TVYDDFAHHPTAIETTI QGLRQRVGG 

orf 132ng-l . pep ARILAVLEPRSNTMKLGTMKSALPASLKEADQVFCYAGGADWDVAEALAPLGCRLRVGKD 
I I I I I I I I I I I I M I I I I I I I I I I : I I I I I I | I I || M | : | | | | | | | | | | | | || | | | | 
orfl32-l ARILAVLE PRSNTMKLGTMKSALPVS LKE ADQVFCYAGGVD WD VAEALAPLGGRLNVGKD 

orf 132ng-l .pep FDTFVAEIVKNARTGDHILVMSNGGFGGIHTKLLDALRX 
1 I = I I M I I I 1 I : : I I I I II I I I I I I I I I I I I I : j I I I 
orfl32-l FDAFVAE I VKN AE VGDH I LVMSNGG FGG I HGKLLE ALRX 

In addition, ORF132ng-l is homologous to a hypothetical E.coli protein: 



CGTTTTTCGT 
CGCTCCAAAT 
GGAATTCGAC 
AGTTCCACCA 
AACGGACAGC 
GCCGGTGGAA 
ATGCCGACGG 
GTCGCATGGG 
CATCGCTGCC 
CCTTGGGTGC 
GCAAACGGCA 
CGAAACCACG 
TCGCCGTCCT 
TCCGCCCTGC 
CGGCGGCGCG 
GGCTGCGCGT 
AACGCCCGAA 
CGGAATACAC 
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pir[|S56459 hypothetical protein o457 - Escherichia coli >gi|537075 (U14003) 
ORF_o457 [Escherichia coli] >gi I 1790680 (AE000494) hypothetical 48.5 kD prote: 
in fbp-pmba intergenic region [Escherichia coli] Length = 457 
Score = 474 bits (1207), Expect = e-133 

Identities = 249/439 (56%), Positives = 294/439 (66%), Gaps = 13/439 (2%) 

KE AG FKV S GC DAKMYP PM S T QLE ALG I GVHE G FD AAQ LEE FQAD I YV I GN VARRGMD WE 8 1 
++ G +V+G DA +YPPMST LE GI + +G+DA+QLE Q D+ +IGN RG VE 
RQLGHEVTGSDANVYPPMSTLLEKQGIELIQGYDASQLEP-QPDLVIIGNAMTRGNPCVE 7 9 

AILNRGLPYISGPQWLAENVLHHHWVLGVAGTHGKTTTASMLAWVLEYAGLAPGFLIGGV 141 
A+L + +PY+SGPQWL + VL WVL VAGTHGKTTTA M W+LE G PGF+IGGV 
AVLEKNIPYMSGPQWLHDFVLRDRWVLAVAGTHGKTTTAGMATWILEQCGYKPGFVIGGV 13 9 

PENFGVSARLPQTPRQDPNSKSPFFVIEADEYDTAFFDKRSKFVHYRPRTAVLNNLEFDH 2 01 
P NF VSA L +S FFVIEADEYD AFFDKRSKFVHY PRT +LNNLEFDH 

PGNFEVSAHL GESDFFVIEADEYDCAFFDKRSKFVHYCPRTLILNNLE FDH 190 

ADIFADLGAIQTQFHHLVRTVPSEGLIVCNGQQQSLQDTLDKGCWTPVEKFGTGHGWQIG 2 61 
ADIF DL AIQ QFHHLVR VP +G 1+ +L+ T+ GCW+ EG WQ 



Query: 


22 


Sb j ct : 


21 


Query: 


82 


Sbjct: 


80 


Query: 


142 


Sbjct: 


140 


Query: 


202 


Sbjct: 


191 


Query: 


262 


Sbjct: 


251 


Query: 


321 


Sbjct: 


311 


Query: 


380 


Sbjct: 


371 


Query: 


439 


Sb j ct : 


431 



++ d s ++vlldg+k g v w l+g hn n l iaaarh gv a alg+f ] 
klttdasewevlldgekvgevkwslvgehnmhnglmaiaaarhvgvapadaanalgsfi) 

vkrrmeikgtangitvyddfahhptaiettiqglrqrvgg-arilavle prsntmklgti 

+rr+e++g ang+tvyddfahhptai t+ lr +vgg ari+avleprsntmk+g 
arrrlelrgeangvtvyddfahhptailatlaalrgkvggtariiavleprsntmkmgk 

ksalpaslkeadqvf-cyaggadwdvaealaplgcrlrvgkdfdtfvaeivknartgdh: 
k l sl ad+vf w vae d 

kddlapslgradevfllqpahipwqvaevaeacvqpahwsgdv 

lvmsnggfggihtklldal 457 

LVMSNGGFGGIH klld l 
LVMSNGGFGGIHQKLLDGL 449 

Based on this analysis, it was predicted that these proteins from N. meningitidis and N. gonorrhoeae, 
and their epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 

ORF132-1 (26.4kDa) was cloned in pET and pGex vectors and expressed in E.coli, as described 
above. The products of protein expression and purification were analyzed by SDS-PAGE. Figure 
20A shows the results of affinity purification of the His-fusion protein, and Figure 20B shows the 
results of expression of the GST-fusion in E.coli. Purified His-fusion protein was used to immunise 
mice, whose sera were used for FACS analysis (Figure 20C) and ELISA (positive result). These 
experiments confirm that ORF132 is a surface-exposed protein, and that it is a useful immunogen. 

Example 103 

The following partial DNA sequence was identified in N. meningitidis <SEQ ID 875> 

1 . . CCGGGCTATT ACGGCTCGGA TGACGAATTT AAGCGGGCAT TCGGAGAAAA 

51 CTCGCCGACA TmCAAGAAAC ATTGCAACCG GAGCTGCGGG ATTTATGAAC 

101 CCGTATTGAA AAAATACGGC AAAAAGCGCG CCAACAACCA TTCGGTCAGC 

151 ATTAGTGCGG ACTTCGGCGA TTATTTCATG CCGTTCGCCA GCTATTCGCG 

201 CACACACCGT ATGCCCAACA TCCAAGAAAT GTATTTTTCC CAAATCGGCG 

251 ACTCCGGCGT TCACACCGCC TTAAAACCAG AGCGCGCAAA CACTTGGCAA 

301 TTTGGCTTCr ATACCTATAA AAAAGGATTG TTAAAACAAG ATGATACATT 
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351 AGGAT T AAAA CTGGTCGGCT ACCGCAGCCG CATCGACAAC TACATCCACA 

401 ACGTTTACGG GAAATGGTGG GATTTGAACG GGGATATTCC GAGCTGGGTC 

451 AGCAGCACCG GGCTTGCCTA CACCATCCAA CATCGCrATT TCAwAGACAA 

501 AGTGCATCAA nnnnnnnnnn nnnnnnnnnn nnnnTACGAT TATGGGCGTT 

551 TTTTCACCAA CCTTTCTTAC GCCTATCAAA AAAGCACGCA ACCGACCAAC 

601 TTCAGCGATG CGAGCGAATC GCCCAACAAT GCGTCCAAAG AAGACCAACT 

651 CAAACAAGGT TATGGGTTGA GCAGGGTTTC CGCCCTGCCG CGAGATTACG 

701 GACGTTTGGA AGTCGGTACG CGCTGGTTGG GCAACAAACT GACTTTGGGC 

751 GGCGCGATGC GCTATTTCGG CAAGAGCATC CGCGCGACGG CTGAAGAACG 

801 CTATATCGAC GGCACCAACG GGGGAAATAC CAGCAATTTC CGGCAACTGG 

851 GCAAGCGTTC CAT CAAACAA ACCGAAACTC TTGCCCGCCA GCCTTTGATT 

901 TTwGATTTTa ACGCCGCTTA CGAGCCGAAG AAAAACCTTA TTTTCCGCGC 

951 CGAAGTCAAA AATCTGTTCG ACAGGCGTTA TATCGATCCG CTCGATGCGG 

1001 GCAATGATGC GGCAAC . GAG CGTTATTACA GCTCGTTCGA CCCGAAAGAC 

1051 AAGGACrrAG ACGTAACGTG TAATGCTGAT AAAACGTTGT GCaACGGCAA 

1101 ATACGGCGGC ACAAGCAAAA GCGTATTGAC CAATTTTGCA CGCGGACGCA 

1151 CCTTTTTgAT GACGATGAGC TACAAGTTTT AA 

This corresponds to the amino acid sequence <SEQ ID 876; ORP133>: 

1 . . PGYYGSDDEF KRAFGENSPT XKKHCNRSCG IYEPVLKKYG KKRANNHSVS 

51 ISADFGDYFM PFASYSRTHR MPNIQEMYFS QIGDSGVHTA LKPERANTWQ 

101 FGFXTYKKGL LKQDDTLGLK LVGYRSRIDN YIHNVYGKWW DLNGDIPSWV 

151 SSTGLAYTIQ HRXFXDKVHQ XXXXXXXXYD YGRFFTNLSY AYQKSTQPTN 

201 FSDASESPNN ASKEDQLKQG YGLSRVSALP RDYGRLEVGT RWLGNKLTLG 

251 GAMRYFGKSI RATAEERYID GTNGGNTSNF RQLGKRSIKQ TETLARQPLI 

301 XDFNAAYEPK KNLIFRAEVK NLFDRRYIDP LDAGNDAAXE RYYSSFD PKD 

351 KDXDVTCNAD KTLCNGKYGG TSKSVLTNFA RGRT FLMTMS YKF* 

Further work revealed the further partial DNA sequence <SEQ ED 877>: 

1 GAGGCGCAGA TACAGGTTTT GGAAGATGTG CACGTCAAGG CGAAGCGCGT 

51 ACCGAAAGAC AAAAAAGTGT TTACCGATGC GCGTGCCGTA TCGACCCGTC 

101 AGGATATATT CAAATCCAGC GAAAACCTCG ACAACATCGT ACGCAGCATC 

151 CCCGGTGCGT TTACACAGCA AGATAAAAGC TCGGGCATTG TGTCTTTGAA 

201 TATTCGCGGC GACAGCGGGT TCGGGCGGGT CAATACGATG GTGGACGGCA 

251 TCACGCAGAC CTTTTATTCG ACTTCTACCG ATGCGGGCAG GGCAGGCGGT 

301 TCATCTCAAT TCGGTGCATC TGTCGACAGC AATTTTATTG CCGGACTGGA 

351 TGTCGTCAAA GGCAGCTTCA GCGGCTCGGC AGGCATCAAC AGCCTTGCCG 

4 01 GTTCGGCGAA TCTGCGGACT TTAGGCGTGG ATGACGTCGT TCAGGGCAAT 

4 51 AATACCTACG GCCTGCTGCT AAAAGGTCTG ACCGGCACCA ATTCAACCAA 

501 AGGTAATGCG ATGGCGGCGA TAGGTGCGCG CAAATGGCTG GAAAGCGGAG 

551 CATCTGTCGG TGTGCTTTAC GGGCACAGCA GGCGCAGCGT GGCGCAAAAT 

601 TACCGCGTGG GCGGCGGCGG GCAGCACATC GGAAATTTTG GCGCGGAATA 

651 TTTGGAACGG CGCAAGCAGC GATATTTTGT ACAAGAGGGT GCTTTGAAAT 

7 01 TCAATTCCGA CAGCGGAAAA TGGGAGCGGG AT TT ACAAAG GCAACAGTGG 

7 51 AAATACAAGC CGTATAAAAA TTACAACAAC CAAGAACTAC AaAAATACAT 

801 CGAAGAGCAT GACAAAAGCT GGCGGGAAAA CCTg.CaCCG CAATACGACA 

851 TTACCCCCAT CGATCCGTCC AGCCTGAAGC AGCAGT CGGC AGGCAATCTG 

901 TTTAAATTGG AATACGACGG CGTATTCAAT AAATACACGG CGCAATTTCG 

951 CGATTTAAAC ACCAAAATCG GCAGCCGCAA AAT CAT CAAC CGCAATTATC 

10 01 AGTTCAATTA CGGTTTGTCT TTGAACCCGT AT AC CAAC CT CAATCTGACC 

1051 GCAGCCTACA ATT CGGGCAG GCAGAAATAT CCGAAAGGGT CGAAGTTTAC 

1101 AGGC TGGGGG CTTTTAAAGG ATTTTGAAAC CTACAACAAC GCGAAAATCC 

1151 TCGACCTCAA CAACACCGCC ACCTTCCGGC TGCCCCGCGA AACCGAGTTG 

1201 CAAACCACTT TGGGCTTCAA TTATTTCCAC AACGAATACG GCAAAAACCG 

1251 CTTTCCTGAA GAATTGGGGC TGTTTTTCGA CGGTCCTGAT CAGGACAACG 

1301 GGCTTTATTC CTATTTGGGG CGGTTTAAGG GCGATAAAGG GCTGCTGCCC 

1351 CAAAAATCAA CCATTGTCCA ACCGGCCGGC AGCCAATATT TCAACACGTT 

14 01 CTACTTCGAT GCCGCGCTCA AAAAAGACAT TTACCGCTTA AACTACAGCA 

14 51 CCAATACCGT CGGCTACCGT TTCGGCGGCG AATATACGGG CTATTACGGC 

1501 TCGGATGACG AATTTAAGCG GGCATTCGGA GAAAACTCGC CGACATACAA 

1551 GAAACATTGC AACCGGAGCT GCGGGATTTA TGAACCCGTA TTGAAAAAAT 

1601 ACGGCAAAAA GCGCGCCAAC AACCATTCGG TCAGCATTAG TGCGGACTTC 

1651 GGCGATTATT TCATGCCGTT CGCCAGCTAT TCGCGCACAC ACCGTATGCC 

17 01 CAACATCCAA GAAATGTATT TTTCCCAAAT CGGCGACTCC GGCGTTCACA 
1751 CCGCCTTAAA ACCAGAGCGC GCAAACACTT GGCAATTTGG CTTCAATACC 

18 01 TATAAAAAAG GATTGTTAAA AC AAGAT GAT ACATTAGGAT TAAAACTGGT 
1851 CGGCTACCGC AGCCGCATCG ACAACTACAT CCACAACGTT TACGGGAAAT 
1901 GGTGGGATTT GAACGGGGAT ATTCCGAGCT GGGTCAGCAG CACCGGGCTT 
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1951 GCCTACACCA TCCAACATCG CAATTTCAAA GACAAAGTGC ACAAACACGG 

2001 TTTTGAGTTG GAGCTGAATT ACGATTATGG GCGTTTTTTC ACCAACCTTT 

2051 CTTACGCCTA TCAAAAAAGC ACGCAACCGA CCAACTTCAG CGATGCGAGC 

2101 GAATCGCCCA ACAATGCGTC CAAAGAAGAC CAACTCAAAC AAGGTTATGG 

2151 GTTGAGCAGG GTTTCCGCCC TGCCGCGAGA TTACGGACGT TTGGAAGTCG 

2201 GTACGCGCTG GTTGGGCAAC AAACTGACTT TGGGCGGCGC GATGCGCTAT 

2251 TTCGGCAAGA GCATCCGCGC GACGGCTGAA GAACGCTATA TCGACGGCAC 

23 01 CAACGGGGGA AATACCAGCA ATTTCCGGCA ACTGGGCAAG CGTTCCATCA 
2351 AACAAACCGA AACTCTTGCC CGCCAGCCTT TGATTTTTGA TTTTTACGCC 

24 01 GCTTACGAGC CGAAGAAAAA CCTTATTTTC CGCGCCGAAG TCAAAAATCT 
24 51 GTTCGACAGG CGTTATATCG ATCCGCTCGA TGCGGGCAAT GATGCGGCAA 
2501 CGCAGCGTTA TTACAGCTCG TTCGACCCGA AAGACAAGGA CGAAGACGTA 
2551 ACGTGTAATG CTGATAAAAC GTTGTGCAAC GGCAAATACG GCGGCACAAG 
2 601 CAAAAGCGTA TTGACCAATT TTGCACGCGG ACGCACCTTT TTGATGACGA 
2 651 TGAGCTACAA GTTTTAA 

This corresponds to the amino acid sequence <SEQ ID 878; ORF133-l>: 

1 EAQIQVLEDV HVKAKRVPKD KKVFTDARAV STRQDIFKSS ENLDNIVRSI 

51 PGAFTQQDKS SGIVSLNIRG DSGFGRVNTM VDGITQTFYS TSTDAGRAGG 

101 SSQFGASVDS NFIAGLDWK GSFSGSAGIN SLAGS ANLRT LGVDDWQGN 

151 NTYGLLLKGL TGTNSTKGNA MAAIGARKWL ESGASVGVLY GHSRRSVAQN 

201 YRVGGGGQHI GNFGAEYLER RKQRYFVQEG ALKFNSDSGK WERDLQRQQW 

251 KYKPYKNYNN QELQKYIEEH DKSWRENLXP QYDITPIDPS SLKQQSAGNL 

301 FKLEYDGVFN KYTAQFRDLN TKIGSRKIIN RNYQFNYGLS LNPYTNLNLT 

351 AAYNSGRQKY PKGSKFTGWG LLKDFETYNN AKILDLNNTA TFRLPRETEL 

401 QTTLGFNYFH NEYGKNRFPE ELGLFFDGPD QDNGLYSYLG RFKGDKGLLP 

451 QKSTIVQPAG SQYFNTFYFD AALKKDIYRL NYSTNTVGYR FGGEYTGYYt? 

501 SDDEFKRAFG ENSPTYKKHC NRSCGIYEPV LKKYGKKRAN NHSVSISADF 

551 GDYFMPFASY SRTHRMPNIQ EMYFSQIGDS GVHTALKPER ANTWQFGFNT 

601 YKKGLLKQDD TLGLKLVGYR SRIDNYIHNV YGKWWDLNGD IPSWVSSTGL 

651 AYTIQHRNFK DKVHKHGFEL ELNYDYGRFF TNLSYAYQKS TQPTNFSDAS 

701 ESPNNASKED QLKQGYGLSR VSALPRDYGR LEVGTRWLGN KLTLGGAMRY 

7 51 FGKS IRATAE ERYIDGTNGG NTSNFRQLGK RSIKQTETLA RQPLIFDFYA 

801 AYEPKKNLIF RAEVKNLFDR RYIDPLDAGN DAATQRYYSS FDPKDKDEDV 

851 TCNADKTLCN GKYGGT SKSV LTNFARGRTF LMTMSYKF* 

Computer analysis of this amino acid sequence gave the following results: 

Homology with with the probable TonB-dependent receptor HI 1 2 1 of H. influenzae (accession number U328Q 1) 
ORF133 and HI121 show 57% aa identity in 363aa overlap: 

Orfl33: 31 IYEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTA 90 

I EP+L K G K+A NHS ++SA+ DYFMPF +YSRTHRMPNI QEM+FS Q+ ++GV+TA 
HI121: 563 INEPILHKSGHKKAFNHSATLSAELSDYFMPFFTYSRTHRMPNIQEMFFSQVSNAGVNTA 622 

Orfl33: 91 LKPERANTWQFGFXTYKKGLLKQDDTLGLKLVGYRSRIDNYIHNVYGKWWDLNGDIPSWV 150 

LKPE+++T+Q GF TYKKGL QDD LG+KLVGYRS I NYIHNVYG WW +P+W 
HI121: 623 LKPEQS DTYQLGFNTYKKGLFTQDDVLGVKLVGYRS FI KNYI HNVYGVWW — RDGMPTWA 680 

Orfl33: 151 SSTGLAYTIQHRXFXDICVHXXXXXXXXXYDYGRFFTNLSYAYQKSTQPTNFSDASESPNN 210 

S G YTI H+ + V YD GRFF N+SYAYQ++ QPTN++DAS PNN 

HI121: 681 ESNGFKYTIAHQNYKPIVKKSGVELEINYDMGRFFANVSYAYQRTNQPTNYADASPRPNN 740 

Orfl33: 211 ASKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYID 270 
AS+ED LKQGYGLSRVS LP+DYGRLE+GTRW KLTLG A RY+GKS RAT EE YI+ 

HI121: 741 ASQEDILKQGYGLSRVSMLPKDYGRLELGTRWFDQKLTLGLAARYYGKSKRATIEEEYIN 800 

Orfl33: 271 GTNGGNTSNFRQLGKRSIKQTETLARQPLIXDFNAAYEPKKNLIFRAEVKNLFDRRYIDP 330 

G+ + R+ ++K+TE + +QP+I D + +YEP K+LI +AEV+NL D+RY+DP 

HI121: 801 GSR-FKKNTLRRENYYAVKKTEDIKKQPIILDLHVSYEPIKDLIIKAEVQNLLDKRYVDP 859 

Orfl33: 331 LDAGNDAAXERYYSSFDPKDKDXDVTCNADKTLCNGKYGGTSKSVLTNFARGRTFLMTMS 390 

LDAGNDAA +RYYSS + + C D + C GG+ K+VL NFARGRT++++++ 
HI121: 860 LDAGNDAASQRYYSSL NNSIECAQDSSAC GGSDKTVL YN FARGRT YI LS LN 910 
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Orfl33: 391 YKF 393 
YKF 

HI121: 911 YKF 913 

5 Homology with a predicted ORF from N. meningitidis (strain A) 

ORF133 shows 90.8% identity over a 392aa overlap with an ORF (ORF133a) from strain A of TV. 
meningitidis: 

10 20 30 

orf 133 .pep PGYYGSDDEFKRAFGENSPTXKKHCNRSCGI 
10 I I I I I I I M I I I I I I I I I I I I I : I I I I 

orf 133a FYFDAALKKDIYRLNYSTNTVGYRFGGXYTGYYXSDDEFKRAFGENSPTYXKHCNQSCGI 
450 460 470 480 490 500 

40 50 60 70 80 90 

15 orf 133 .pep YEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTAL 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i II I II I I I I I I I I I I I I I I I I I I I I I I I 
orf 133a YEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTAL 
510 520 530 540 550 560 

20 100 110 120 130 140 150 

orf 133 . pep KPERANTWQFGFXTYKKGLLKQDDTLGLKLVGYRSRIDNYIHNVYGKWWDLNGDIPSWVS 
I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I 
orf 133a KPERANTWQFGFNTYKKGLLKQDDILGLKLVGYRSRIDXYIHNVYGKWWDLNGNIPSWVS 
570 580 590 600 610 620 

25 

160 170 180 190 200 210 

orf 133 . pep STGLAYTIQHRXFXDKVHQXXXXXXXXYDYGRFFTNLSYAYQKSTQPTNFSDASESPNNA 
I I I I I I I I I I I 1 INI: III I I I I I I I I I II I I I I I I I I I I I I I II I I I 

orf 133a STGLAYTIQHRNFKDKVHKHGFELELNYDYXRFFTNLSYAYQKSTQPTNFSDASESPNNA 
30 630 640 650 660 670 680 

220 230 240 250 260 270 

orf 133 . pep SKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDG 
I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
35 orf 133a SKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDX 

690 700 710 720 730 740 

280 290 300 310 320 330 

orf 133. pep TNGGNTSNFRQLGKRSIKQTETLARQPLIXDFNAAYEPKKNLIFRAEVKNLFDRRYIDPL 
40 Ml II II II I I I I I I I I I I I I I I I I I I I I I I I I I M I I I II I I I I I I I I I I I I 

orf 133a TNGXXTSNFRQLGKRSIXQTETLARQPLIFDXYAAYEPKKXLIFRAEVKNLFDRRYIDPL 
750 760 770 780 790 800 

340 350 360 370 380 390 

45 orf 133 .pep DAGNDAAXERYYSSFDPKDKDXDVTCNADKTLCNGKYGGTSKSVLTNFARGRTFLMTMSY 

I I I I I I I : : I I I I I I I I I I I I : I I I I I : I I I I I I I I I I II I i II I I I I I I I I : I I I I 
orf 133a DAGNDAATQRYYSSFDPKDKDEEVTCNDDNTLCNGKYGGTSKSVLTNFARGXT FLITMSY 

810 820 830 840 850 860 



50 



rf 133 .pep 
rfl33a 



55 A partial ORF1 33a nucleotide sequence <SEQ ID 879> is: 



1 AAAGACAAAA AAGTGTTTAC CGATGCGCGT GCCGTATCGA CCCGTCAGGA 

51 TATATTCAAA TCCANCGAAA ACCTCGACAA CATCGTACGC ANCATCCCCG 

101 GTGCGTTTAC ACANCAANAT AAAAGCTCGG GCNTTGTGTC TTTGAATATT 

151 CGCNGCGACA GCGGGTTCGG GCGGGTCAAT ACNATGGTNG ACGG CAT CAC 

201 NCANACCTTT TATTCGACTT CTACCGATGC GGGCAGGGCA GGCGGTTCAT 

251 CTCAATTCGG TGCATCTGTC GACAGCAATT TTATNGCCGG ACTGGATGTC 

3 01 GTCAAAGGCA GCTTCAGCGG CTCGGCAGGC AT CAACAGCC TTGCCGGTTC 
351 GGCGAATCTG CGGACTTTAN GCGTGGATGA TGTCGTTCAG GGCAATANTA 

4 01 CNTACGGCCT GCTGCTAAAA GGTCTGACCG GCACCAATTC AACCAAAGGT 
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451 AATGCGATGG CGGCGATAGG TGCGCGCAAA TGGCTGGAAA GCGGAGCATC 

5 01 TGTCGGTGTG CTTTACGGGC ACAGCAGGCG CAGCGTGGCG CAAAATTACC 

551 GCGTGGGCGG CGGCGGGCAG CACATCGGAA ATTTTGGCGC GGAATATCTG 

601 GAACGACGCA AGCAACGATA TTTTGAGCAA GAAGGCGGGT TGAAATTCAA 

651 TTCCAACAGC GGAAAATGGG AGCGGGATTT CCAAAAGTCG TACT GGAAAA 

7 01 CCAAGTGGTA T CAAAAAT AC GATGCCCCCC AAGAACTGCA AAAATACATC 

7 51 GAAGGTCATG ATAAAAGCTG GCGGGAAAAC CTGGCGCCGC AATACGACAT 

801 CACCCCCATC GATCCGTCCA GCCTGAAGCN GCAGTCGGCA GGCAACCTGT 

851 TTAAATTGGA ATACGACGGC GTATTCAATA AATACACGGC GCAATTTCGC 

901 GATTTAAACA CCAAAATCGG CAGCCGCAAA ATCATCAACC GCAATTATCA 

951 ATTCAATTAC GGTTTGTCTT TGAACCCGTA TACCAACCTC AATCTGACCG 

1001 CAGCCTACAA TTCGGGCAGG CAGAAAT AT C CGAAAGGGTC GAAGTTTACA 

1051 GGCTGGGGGC TTTTNAAAGA TTTTGAAACC TACAACAACG CAAAAATCCT 

1101 CGACCTCANC AACACCTCCA CCTTCCGGCT GCCCCGTGAA ACCGAGTTGC 

1151 AAACCACTTT GGGCTTCAAT TATTTCCACA ACGAATACGG CAAAAACCGC 

1201 TTTCCTGAAG AATTGGGGCT GTTTTTCGAC GGTCCGGATC ANGACAACGG 

1251 GCTTTATTCC TATTTGGGGC GGTTTAAGGG CGATAAAGGG CTGCTGCCCC 

1301 AAAAATCAAC CATTGTCCAA CCGGCCGGCA GCCAATATTT CAACACGTTC 

1351 TACTTCGATG CCGCGCTCAA AAAAGAC AT T TACCGCTTAA ACTACAGCAC 

14 01 CAATACCGTC GGCTACCGTT TCGGCGGCNA ATATACGGGC TATTACNGCT 

1451 CGGATGACGA ATTTAAGCGG GCATT CGGAG AAAACTCGCC GACATACANG 

1501 AAACATTGCA ACCAGAGCTG CGGAATTTAT GAAC CCGT AT TGAAAAAATA 

1551 CGGCAAAAAG CGCGCCAACA ACCATTCGGT CAGCATTAGT GCGGACTTCG 

1601 GCGATTATTT CATGCCGTTC GCCAGCTATT CGCGCACACA CCGTATGCCC 

1651 AACATCCAAG AAATGTATTT TTCCCAAATC GGCGACTCCG GCGTTCACAC 

17 01 CGCCTTAAAA CCAGAGCGCG CAAACACTTG GCAATTTGGC TTCAATACCT 

17 51 ATAAAAAAGG ATTGTTAAAA CAAGATGATA TATTAGGATT AAAACTGGTC 

1801 GGCTACCGCA GCCGCATCGA CNACTACATC CACAACGTTT ACGGGAAATG 

1851 GTGGGATTTG AACGGGAATA TTCCGAGCTG GGTCAGCAGC ACCGGGCTTG 

1901 CCTACACCAT CCAACACCGC AATTT CAAAG ACAAAGTGCA CAAACACGGT 

1951 TTTGAGTTGG AGCTGAATTA CGATTATNGG CGTTTTTTCA CCAACCTTTC 

2001 TTACGCCTAT CAAAAAAGCA CGCAACCGAC CAACTT CAGC GATGCGAGCG 

2051 AATCGCCCAA CAATGCGTCC AAAGAAGACC AACT CAAACA AGGTTATGGG 

2101 TTGAGCAGGG TTTCCGCCCT GCCGCGAGAT TACGGACGTT TGGAAGTCGG 

2151 TACGCGCTGG TTGGGCAACA AACTGACTTT GGGCGGCGCG ATGCGCTATT 

22 01 TCGGCAAGAG CATCCGCGCG ACGGCTGAAG AACGCTATAT CGACGNCACC 

2251 AATGGGGNAN NTACCAGCAA TTTCCGGCAA CTGGGCAAGC GTTCCATCAN 

2301 ACAAACCGAA ACCCTTGCCC GCCAGCCTTT GATTTTTGAT TTNTACGCCG 

2351 CTTACGAGCC GAAGAAAAAN CTTATTTTCC GCGCCGAAGT CAAAAATCTG 

24 01 TTCGACAGGC GTTATATCGA TCCGCTCGAT GCGGGCAATG ATGCGGCAAC 

24 51 GCAGCGTTAT TACAGTTCGT TCGACCCGAA AGACAAGGAC GAAGAAGTAA 

2501 CGTGTAATGA TGATAACACG TTATGCAACG GCAAATACGG CGGCACAAGC 

2551 AAAAGCGTAT TGACCAATTT TGCACGCGGA CNCACCTTTT TGATAACGAT 

2 601 GAGCTACAAG TTTTAA 

This encodes a protein having (partial) amino acid sequence <SEQ ID 880>: 



1 KDKKVFTDAR AVSTRQDIFK SXENLDNIVR XIPGAFTXQX KSSGXVSLNI 

51 RXDSGFGRVN TMVDGITXTF YSTSTDAGRA GGSSQFGASV DSNFXAGLDV 

101 VKGSFSGSAG INSLAGSANL RTLXVDDVVQ GNXTYGLLLK GLTGTNSTKG 

151 NAMAAI GARK WLESGASVGV LYGHSRRSVA QNYRVGGGGQ HIGNFGAEYL 

201 ERRKQRYFEQ EGGLKFNSNS GKWERDFQKS YWKTKWYQKY DAPQELQKYI 

251 EGHDKSWREN LAPQYDITPI DPSSLKXQSA GNLFKLEYDG VFNKYTAQFR 

301 DLNTKIGSRK IINRNYQFNY GLSLNPYTNL NLTAAYNSGR QKYPKGSKFT 

351 GWGLXKDFET YNNAKILDLX NTSTFRL PRE TELQTTLGFN YFHNEYGKNR 

4 01 FPEELGLFFD GPDXDNGLYS YLGRFKGDKG LLPQKSTIVQ PAGSQYFNTF 

4 51 YFDAALKKDI YRLNYSTNTV GYRFGGXYTG YYXSDDEFKR AFGENSPTYX 

501 KHCNQSCGIY EPVLKKYGKK RANNHSVSIS ADFGDYFMPF ASYSRTHRMP 

551 NIQEMYFSQI GDSGVHTALK PERANTWQFG FNTYKKGLLK QDDILGLKLV 

601 GYRSRIDXYI HNVYGKWWDL NGNIPSWVSS TGLAYTIQHR NFKDKVHKHG 

651 FELELNYDYX RFFTNLSYAY QKSTQPTNFS DASESPNNAS KEDQLKQGYG 

7 01 LSRVSALPRD YGRLEVGTRW LGNKLTLGGA MRYFGKSIRA TAEERYIDXT 

7 51 NGXXTSNFRQ LGKRS IXQTE TLARQPLIFD XYAAYEPKKX LIFRAEVKNL 

8 01 FDRRYIDPLD AGN DAATQRY YSSFDPKDKD EEVTCNDDNT LCNGKYGGTS 
8 51 KSVLTNFARG XTFLITMSYK F* 

ORF133a and ORF133-1 show 94.3% identity in 871 aa overlap: 



orfl33a.pep 



10 20 30 40 

KDKKVFTDARAVSTRQDIFKSXENLDNIVRXIPGAFTXQXKS 
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I I I I I I I I I I I I I I I I I I I I I I i I I I I I I MINI 

EAQIQVLEDVHVKAKRVPKDKKVFTDARAVSTRQDIFKSSENLDNIVRSIPGAFTQQDKS 



30 



45 



65 



10 



20 



30 



50 



60 



50 60 70 80 90 100 

orf 133a . pep SGXVSLNIRXDSGFGRVNTMVDGITXTFYSTSTDAGRAGGSSQFGASVDSNFXAGLDWK 
II MINI I I I I I I 1 I I I I ! I I I I I I i I M I I I I I I I I I I I I I I I I I II I I I I I I I 
orf 133-1 SGIVSLNIRGDSGFGRVNTMVDGITQTFYSTSTDAGRAGGSSQFGASVDSNFIAGLDWK 
70 80 90 100 110 120 

110 120 130 140 150 160 

orf 133a . pep GSFSGSAGINSLAGSANLRTLXVDDWQGNXTYGLLLKGLTGTNSTKGNAMAAIGARKWL 
I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I! I I I I I I I I I I I I I I I I I I 
orf 133-1 GSFSGSAGINSLAGSANLRTLGVDDWQGNNTYGLLLKGLTGTNSTKGNAMAAIGARKWL 
130 140 150 160 170 180 

170 180 190 200 210 220 

orf 133a . pep E SG AS VGVLYGHSRR S VAQNYRVGGGGQHI GN FGAE YLERRKQRYFE QE GG LKFN SN S GK 
I I I I I I I I I I I M I ! I I I I I I I I I I I I I I I I I I 1 I 1 I I I I I I I i I ! I I I : I I I I I : I I I 
orf 133-1 ESGASVGVLYGHSRRSVAQNYRVGGGGQHIGNFGAEYLERRKQRYFVQEGALKFNSDSGK 
190 200 210 220 230 240 

230 240 250 260 270 280 

orf 133a . pep WERDFQKSYWKTKWYQKYDAPQELQKYIEGHDKSWRENLAPQYDITPIDPSSLKXQSAGN 
1111:1:: II i I : : I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
orf 133-1 WERDLQRQQWKYKPYKNYNN-QELQKYIEEHDKSWRENLXPQYDITPIDPSSLKQQSAGN 
250 260 270 280 290 

290 300 310 320 330 340 

orf 133a . pep LFKLEYDGVFNKYTAQFRDLNTKIGSRKIINRNYQFNYGLSLNPYTNLNLTAAYNSGRQK 

I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I II I i I I I I I I I I I I I I I I I I I I I I I I II 
orf 133-1 LFKLEYDGVFNKYTAQFRDLNTKIGSRKIINRNYQFNYGLSLNPYTNLNLTAAYNSGRQK 

300 310 320 330 340 350 

350 360 370 380 390 400 

orf 133a. pep YPKGSKFTGWGLXKDFETYNNAKILDLXNTSTFRLPRETELQTTLGFNYFHNEYGKNRFP 

II I I I I I I I I I I I I I i I I I I I I I I I I I I : I I I I I I I I I I II I I I II II I I I I I I I 1 I I 
orf 133-1 YPKGSKFTGWGLLKDFETYNNAKILDLNNTATFRLPRETELQTTLGFNYFHNEYGKNRFP 

360 370 380 390 400 410 

410 420 430 440 450 460 

orf 133a. pep EELGLFFDGPDXDNGLYSYLGRFKGDKGLLPQKSTIVQPAGSQYFNTFYFDAALKKDIYR 
I I II I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I i I [ I I I I I II I I I I II I I I I I 
orf 133-1 EELGLFFDGPDQDNGLYSYLGRFKGDKGLLPQKSTIVQPAGSQYFNTFYFDAALKKDIYR 
420 430 440 450 460 470 

470 480 490 500 510 520 

orf 133a . pep LNYSTNTVGYRFGGXYTGYYXS DDE FKRAFGENS PTYXKHCNQ SCGI YE PVLKKYGKKRA 
I I I I 1 I I I I I I I I I I I II I I I I I I I I I I I I I I I II I II I : I I I I I I II II I I I I I I I 
orf 133-1 LNYSTNTVGYRFGGEYTGYYGSDDEFKRAFGENSPTYKKHCNRSCGI YE PVLKKYGKKRA 

480 490 500 510 520 530 

530 540 550 560 570 580 

orf 133a. pep NNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTALKPERANTWQFGFN 
I II II M II I I I I I I II I I I I II I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I II I I I 
orf 133-1 NNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTALKPERANTWQFGFN 
540 550 560 570 580 590 

590 600 610 620 630 640 

orf 133a . pep TYKKGLLKQDDILGLKLVGYRSRIDXYIHNVYGKWWDLNGNIPSWVSSTGLAYTIQHRNF 
I I I I I I I I I I I I II M I I II I I I I I I II I II I I I I II I : I I I I I I II II II II II II I 
orf 133-1 TYKKGLLKQDDTLGLKLVGYRSRIDNYIHNVYGKWWDLNGDIPSWVSSTGLAYTIQHRNF 
600 610 620 630 640 650 

650 660 670 680 690 700 

orf 133a. pep KDKVHKHGFELELNYDYXRFFTNLSYAYQKSTQPTNFSDASESPNNASKEDQLKQGYGLS 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I | | | | | | | | | | || | | | | 
orf 133-1 KDKVHKHGFELELNYDYGRFFTNLSYAYQKSTQPTNFSDASESPNNASKEDQLKQGYGLS 
660 670 680 690 700 710 
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710 720 730 740 750 760 

RVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDXTNGXXTSNFRQLG 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I III 1 I I I 11 1 1 
RVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDGTNGGNTSNFRQLG 
720 730 740 750 760 770 

770 780 790 800 810 820 

KRSIXQTETLARQPLIFDXYAAYEPKKXLIFRAEVKNLFDRRYIDPLDAGNDAATQRYYS 
I I I I I I I I I M I I I I I I II I I I I II I I I I M I I I I I I I I I I I I I I I I I II I I I I I I I 
KRSIKQTETLARQPLIFDFYAAYEPKKNLIFRAEVKNLFDRRYIDPLDAGNDAATQRYYS 
780 790 800 810 820 830 

830 840 850 860 870 

SFDPKDKDEEVTCNDDNTLCNGKYGGTSKSVLTNFARGXTFLITMSYKFX 
I I I I I I II I : I I 1 I I : I I I I I I I I I I I I I I I I I I I I I 111:1111111 
SFDPKDKDEDVTCNADKTLCNGKYGGTSKSVLTNFARGRTFLMTMSYKFX 
840 850 860 870 880 



Homology with a predicted ORF from N. gonorrhoeae 
20 ORF133 shows 92.3% identity over 392 aa overlap with a predicted ORF (ORF133ng) from K 
gonorrhoeae: 

orfl33.pep PGYYGSDDE FKRAFGEN S PTXKKHCNRS CG I 31 

I I I II : : I I I I I I I I I I I : I : I I : III: 
orfl33ng FYFDAALKKDIYRLNYSTNAINYRFGGEYTGYYGSENE FKRAFGENSPAYKEHCDPSCGL 5 60 

25 

orf 133 .pep YEPVLKKYGKKRANNHSVSISADFGDYFMPFASYSRTHRMPNIQEMYFSQIGDSGVHTAL 91 

I I I I I I I I I II I I I I I I I II I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I II I I I I I I I 
orfl33ng YEPVLKKYGKKRANNHSVSISADFGDYFMPFAGYSRTHRMPNIQEMYFSQIGDSGVHTAL 620 

orf 133 .pep KPERANTWQFGFXTYKKGLLKQDDTLGLKLVGYRSRIDNYIHNVYGKWWDLNGDIPSWVS 151 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I M I II II II I I I I I I : 
orfl33ng KPERANTWQFGFNTYKKGLLKQDDILGLKLVGYRSRIDNYIHNVYGKWWDLNGDIPSWVG 680 

orfl33 .pep STGLAYTIQHRXFXDKVHQXXXXXXXXYDYGRFFTNLSYAYQKSTQPTNFSDASESPNNA 211 

I I I I I I I I : I I I MM: | | | | | M M I I I I I I I I I I I I I I II I I I I II I I 

orfl33ng STGLAYTIRHRNFKDKVHKHGFELELNYDYGRFFTNLSYAYQKSTQPTNFSDASESPNNA 7 40 

orf 133 . pep SKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDG 271 

M II I I I II I I I I I I I I II I I I I I I I I I I I I I I I I I II I I II I M I I I II I I I I I I I I II 
orfl33ng SKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMRYFGKSIRATAEERYIDG 8 00 

orf 133 .pep TNGGNTSNFRQLGKRSIKQTETLARQPLIXDFNAAYEPKKNLIFRAEVKNLFDRRYIDPL 331 

I I II II I I I I I I I I II I I I I I I I I I I I I II I I I I I I I I I I M I I I I I I I I I II I II I 
orf 133ng TNGGNTSNVRQLGKRSIKQTETLARQPLIFDFYAAYE PKKNLIFRAEVKNLFDRRYIDPL 8 60 



30 



orf 133 -pep DAGNDAAXERYYSSFDPKDKDXDVTCNADKTLCNGKYGGTSKSVLTNFARGRTFLMTMSY 3 91 

I I I I I I I : : I II II I I II I II I I II I I I I I I I 1 I II II I II II I I I I M M I I I I II I I 
orfl33ng DAGNDAATQRYYSSFDPKDKDEDVTCNADKTLCNGKYGGTSKSVLTNFARGRTFLMTMSY 920 

orf 133. pep KF 393 

orfl33ng KF 922 

The complete length ORF133ng nucleotide sequence <SEQ ID 881> is predicted to encode £ 
protein having amino acid sequence <SEQ ID 882>: 



1 


MRSSFRLKPI 


CFYLMGVMLY 


51 


PKDKKVFT DA 


RAVSTRQDVF 


101 


IRGDSGFGRV 


NTMVDGITQT 


151 


VVKGSFSGSA 


GINSLAGSAN 


201 


GNAMAA I GAR 


KWLESGASVG 


251 


LERRKQQYFV 


QEGGLKFNAG 


301 


IEEHDKSWRE 


NLAPQYDITP 


351 


RDLNTRIGSR 


KI INRNYQFN 


401 


TGWGLLKDFE 


TYNNAKILDL 



HHSYAEDAGR 
KSGENLDNIV 
FYSTSTDAGR 
LRTLGVDDW 
VLYGHSRRGV 
SGKWERDLQR 
IDPSGLKQQS 
YGLSLNPYTN 
NNT AT FRLPR 



AGSEAQIQVL 
RSIPGAFTQQ 
AGGSSQFGAS 
QGNNTYGLLL 
AQNYRVGGGG 
QYWKTKWYKK 
AGNLiNLEYD 
LNLTAAYNSG 
ETELQTTLGF 



EDVHVKAKRV 
DKSSGIVSLN 
VDSNFIAGLD 
KGLTGTNSTK 
QHIGNFGEEY 
YEDPQELQKY 
GVFNKYTAQF 
RQKYPKGAKF 
NYFHNEYGKN 
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451 
501 
551 
601 
651 
701 
751 
801 
851 
901 



RFPEELGLFF 
FYFDAALKKD 
KEHCDPSCGL 
PNIQEMYFSQ 
VGYRSRIDNY 
GFELELNYDY 
GLSRVSALPR 
TNGGNT SNVR 
LFDRRYIDPL 
SKSVLTNFAR 



DGPDQDNGLY 
IYRLNYSTNA 
YE PVLKKYGK 
IGDSGVHTAL 
IHNVYGKWWD 
GRFFTNLSYA 
DYGRLEVGTR 
QLGKRSIKQT 
DAGN DAATQR 
GRTFLMTMSY 



SYLGRFKGDK 
INYRFGGEYT 
KRANNHSVSI 
KPERANTWQF 
LNGDIPSWVG 
YQKSTQPTNF 
WLGNK LTLGG 



GLLPQKSTIV 
GYYGSENEFK 
SADFGDYFMP 
GFNTYKKGLL 
STGLAYTIRH 
SDASESPNNA 
AMRYFGKS IR 



QPAGSQYFNT 
RAFGENSPAY 
FAGYSRTHRM 
KQDDILGLKL 
RNFKDKVHKH 
SKEDQLKQGY 
ATAEERYIDG 
NLIFRAEVKN 
TLCNGKYGGT 



A variant was also identified, being encoded by the gonococcal DNA sequence <SEQ ID 883>: 



51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 
2751 



ATGAGATCTT 
TATGCTATAT 
AGGCGCAGAT 
CCGAAAGACA 
gGATGTGTTC 
CCGGTGCGTT 
ATTCGCGGCG 
CACGCAGACC 
CATCTCAATT 
GTCGTCAAAG 
TTCGGCGAAT 
ATACCTACGG 
GGTAATGCGA 
GTCTGTCGGT 
ACCGCGTGGG 
CTGGAACGGC 
CAATGCCGGC 
AAACAAAGTG 
AT CGAAGAGC 
CATCACCCCC 
TGTT TAAATT 
CGCGATTTAA 
TCAATT CAAT 
CCGCAGCCTA 
ACAGGCTGGG 
CCTCGACCTC 
TGCAAACCAC 
CGCTTTCCTG 
CGGGCTTTAT 
CTCAAAAATC 
TTCTACTTCG 
CACCAATGCA 
GCTCGGAAAA 
AAGGAACATT 
ATACGGCAAA 
TCGGCGATTA 
CCCAACATCC 
CACCGCCTTA 
C C T AT AAAAA 
GTCGGCTACC 
ATGGTGGGAT 
TTGCCTACAC 
GGTTTTGAGC 
TTCTTACGCC 
GCGAATCGCC 
GGGCTGAGCA 
CGGTACGCGC 
ATTTCGGCAA 
AC CAACGGGG 
CAAACAAACC 
CCGCTTACGA 
CTGTTCGACA 
AACGCAGCGT 
TAACGTGTAA 
AGCAAAAGCG 
GATGAGCTAC 



CTTTCCGGTT 
CATCATAGTT 
ACAGGTTTTG 
AAAAAGTGTT 
AAATCCGGCG 
TACACAGCAA 
ACAGCGGGTT 
TTTTATTCGA 
CGGTGCATCT 
GCAGCTTCAG 
CTGCGGACTT 
CCTGCTGCTA 
TGGCGGCGAT 
GTGCTTTACG 
CGGCGGCGGG 
GCAAACAGCA 
AGCGGAAAAT 
GTATAAAAAA 
AT GAT AAAAG 
ATCGATCCGT 
GGAATACGAC 
ACACCAGAAT 
TACGGTTTGT 
CAATTCGGGC 
GGCTTTTAAA 
AACAACACCG 
TTTGGGCTTC 
AAGAATTGGG 
TCCTATTTGG 
AACCATTGTC 
ATGCCGCGCT 
ATCAACTACC 
CGAATTTAAG 
GCGACCCGAG 
AAGCGCGCCA 
TTTCATGCCG 
AAGAAATGTA 
AAACCAGAGC 
AGGATTGTTA 
GCAGCCGCAT 
TTGAACGGGG 
CATCCGACAC 
TGGAGCTGAA 
TATCAAAAAA 
CAACAATGCC 
GGGTTTCCGC 
TGGTTGGGCA 
GAGCATCCGC 
GAAAT AC CAG 
GAAACCCTTG 
GCCGAAGAAA 
GGCGTTATAT 
TATTACAGCT 
TGCTGATAAA 
TATTGACCAA 
AAGTTTTAA 



GAAGCCGATT 
ATGCCGAAGA 
GAAGATGTGC 
TACCGATGCG 
AAAACCTCGA 
GATAAAAGCT 
CGGGCGGGTC 
CTTCTACCGA 
GTCGACAGCA 
CGGCTCGGCA 
TAGGCGTGGA 
AAAGGTCTGA 
AGGTGCGCGC 
GGCACAGCAG 
CAGCACATCG 
ATATTTTGTA 
GGGAACGGGA 
TACGAAGACC 
CTGGCGGGAA 
CCGGCCTGAA 
GGCGT AT TCA 
CGGCAGCCGC 
CTTTGAACCC 
AGGCAGAAAT 

AGATTTTGAA 
CCACCTTCCG 
AATTATTTCC 
GCTGTTTTTC 
GGCGGTTTAA 
CAACCGGCCG 
CAAAAAAGAC 
GTTTCGGCGG 
CGGGCATTCG 
CTGCGGGCTT 
ACAAC CAT T C 
TTCGCCGGCT 
TTTTTCCCAA 
GCGCAAACAC 
AAACAAG AT G 
TGACAACTAC 
ATATTCCGAG 
CGCAATTTCA 
TTACGATTAT 
GCACGCAACC 
tccaaAGAAG 
CCTGCCGCGA 
ACAAACT GAC 
GCGACGGCTG 
CAATGTCCGG 
CCCGACAGCC 
AACCTTATTT 
CGATCCGCTC 
CGTTCGACCC 
ACGTTGTGCA 
TTTCGCACGC 



TGTTTTTATC 
TGCAGGGCGC 
ACGTCAAGGC 
CGTGCCGTAT 
CAACATCGTA 
CGGGCATTGT 
AATACGATGG 
TGCGGGCAGG 
ATTTTATTGC 
GGCATCAACA 
TGACGTCGTT 
CCGGCACCAA 
AAATGGCTGG 
GCGCGGCGTG 
GAAATTTTGG 
CAAGAGGGTG 
TTTGCAAAGG 
CCCAAGAACT 
AACCTGGCGC 
GCAGCAGTCG 
ATAAATACAC 
AAAAT CAT C A 
GTATACCAAC 
AT CCGAAAGG 
ACCTACAACA 
GCTGCCCCGC 
ACAACGAATA 
GACGGTCCTG 
GGGCGATAAA 
GCAGCCAATA 
ATTTACCGCT 
CGAATATACG 
GAGAAAACTC 
TATGAACCCG 
GGTCAGCATT 
ATTCGCGCAC 
ATCGGCGACT 
TTGGCAATTT 
AT AT AT T AGG 
ATCCACAACG 
CTGGGTCGGC 
AAGACAAAGT 
GGGCGTTTTT 
GACCAATTTC 
ACCAACTCAA 
GATTACGGAC 
TTTGGGCGGC 
AAGAACGCTA 
CAACTGGGCA 
TTTGATTTTT 
TCCGCGCCGA 
GATGCGGGCA 
GAAAGACAAG 
ACGGCAAATA 
GGACGCACCT 



TTATGGGTGT 
GCGGGCAGCG 
GAAGCGCGTA 
CGACCCGTca 
CGCAGCATAC 
GTCTTTGAAT 
TGGACGGCAT 
GCAGGCGGTT 
CGGACTGGAT 
GCCTTGCCGG 
CAGGGCAATA 
TTCAACCAAA 
AAAGCGGAGC 
GCGCAAAATT 
T GAAGAAT AT 
GTTTGAAATT 
CAATACTGGA 
GCAAAAATAC 
CGCAATACGA 
GCAGGCAATC 
GGCGCAATTT 
ACCGCAATTA 
CTCAATCTGA 
GGCGAAGTTT 
ACGCGAAAAT 
GAAACCGAGT 
CGGCAAAAAC 
ATCAGGACAA 
GGGCTGTTGC 
TTTCAACACG 
TAAACTACAG 
GGCTATTACG 
GCCGGCATAC 
TATTGAAAAA 
AGTGCGGACT 
ACACCGTATG 
CCGGCGTTCA 
GGCTTCAATA 
ATTGAAACTG 
TTTACGGGAA 
AGCACCGGGC 
GCACAAACAC 
TCACCAACCT 
AGCGATGCGA 
ACAAGGT TAT 
GTTTGGAAGT 
GCGAtgcGCT 
TATCGACGGC 
AGCGTTCCAT 
GATTTTTACG 
AGTCAAAAAC 
ATGATGCGGC 
GACGAAGACG 
CGGCGGCACA 
TCTTGATGAC 
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This corresponds to the amino acid sequence <SEQ ID 884; ORF133ng-l>: 



1 MRSSFRLKPI CFYLMGVMLY 



101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 



20 



PKDKKVFTDA 
IRGDSGFGRV 
WKGSFSGSA 
GNAMAA I GAR 
LERRKQQYFV 
IEEHDKSWRE 
RDLNTRIGSR 
TGWGLLKDFE 
RFPEELGLFF 
FYFDAALKKD 
KEHCDPSCGL 
PNIQEMYFSQ 
VGYRSRIDNY 
GFELELNYDY 
GLSRVSALPR 
TNGGNTSNVR 
LFDRRYIDPL 
SKSVLTNFAR 



RAVSTRQDVF 
NTMVDGITQT 
GINSLAGSAN 
KWLESGASVG 
QEGGLKFNAG 
NLAPQYDITP 
KIINRNYQFN 
TYNNAKILDL 
DGPDQDNGLY 
IYRLNYSTNA 
YEPVLKKYGK 
I GD SGVHTAL 
IHNVYGKWWD 
GRFFTNLSYA 
DYGRLEVGTR 
QLGKRSIKQT 
DAGNDAATQR 
GRTFLMTMSY 



HHSYA EDAGR 
KSGENLDNIV 
FYSTSTDAGR 
LRTLGVDDW 
VLYGHSRRGV 
SGKWERDLQR 
IDPSGLKQQS 
YGLSLNPYTN 
NNTATFRLPR 
SYLGRFKGDK 
INYRFGGEYT 
KRANNHSVSI 
KPERANTWQF 
LNGDIPSWVG 
YQKSTQPTNF 
WLGNKLTLGG 
ETLARQPLIF 
YYSSFDPKDK 
KF* 



AGSEAQIQVL 
RSIPGAFTQQ 
AGGSSQFGAS 
QGNNTYGLLL 
AQNYRVGGGG 
QYWKTKWYKK 
AGNLFKLEYD 
LNLTAAYNSG 
ETELQTTLGF 
GLLPQKSTIV 
GYYGSENEFK 
SADFGDYFMP 
GFNTYKKGLL 
STGLAYTIRH 
SDASESPNNA 
AMRYFGKSIR 
DFYAAYEPKK 
DEDVTCNADK 



EDVHVKAKRV 
DKSSGIVSLN 
VDSNFIAGLD 
KGLTGTNSTK 
QHIGNFGEEY 
YEDPQELQKY 
GVFNKYTAQF 
RQKYPKGAKF 
NYFHNEYGKN 
QPAGSQYFNT 
RAFGEN S PAY 
FAGYSRTHRM 
KQDDILGLKL 
RNFKDKVHKH 
SKEDQLKQGY 
ATAEERYIDG 
NLIFRAEVKN 
TLCNGKYGGT 



ORF133ng-l and ORF133-1 show 96.2% identity in 889 aa overlap: 



30 



rfl33ng-l.pep SFRLKPICFYLMGVMLYHHSYAEDAGRAGSEAQIQVLEDVHVKAKRVPKDKKVFT DARAV 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
r f 1 3 3 - 1 EAQI QVLEDVHVKAKRVPKDKKVFT DARAV 



10 



20 



30 



70 80 90 100 110 120 

orf 133ng-l . pep STRQDVFKSGENLDNIVRSIPGAFTQQDKSSGIVSLNIRGDSGFGRVNTMVDGITQTFYS 
I II I I : I I I : I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I 
orf 133-1 STRQDIFKSSENLDNIVRSIPGAFTQQDKSSGIVSLNIRGDSGFGRVNTMVDGITQTFYS 



50 



60 



70 



80 



90 



130 140 150 160 170 180 

orfl33ng-l .pep TSTDAGRAGGSSQFGASVDSNFIAGLDWKGSFSGSAGINSLAGSANLRTLGVDDVVQGN 
I I I I I I I I I I I I I I I I I I I I I I I II I I I I 1 I ! I I I I I I I I I I I I I I I I I I I I I I I I I M I 
orf 133-1 TSTDAGRAGGSSQFGASVDSNFIAGLDWKGSFSGSAGINSLAGSANLRTLGVDDWQGN 
100 110 120 130 140 150 

190 200 210 220 230 240 

orf 133ng-l . pep NT YGLLLKGLTGTN S TKGNAMAAI GARKWLE S GAS VGVL YGHS RRGVAQNYRVGGGGQH I 



250 260 270 280 290 300 

. pep GNFGEEYLERRKQQYFVQEGGLKFNAGSGKWERDLQRQYWKTKWYKKYEDPQELQKYIEE 



60 



310 320 330 340 350 360 

orf 133ng-l .pep HDKSWRENLAPQYDITPIDPSGLKQQSAGNLFKLEYDGVFNKYTAQFRDLNTRIGSRKII 
I I I I I I I II I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I II I I 
orf 133-1 HDKSWRENLXPQYDITPIDPSSLKQQSAGNLFKLEYDGVFNKYTAQFRDLNTKIGSRKII 
270 280 290 300 310 320 

370 380 390 400 410 420 

orf 133ng-l . pep NRNYQFNYGLSLNPYTNLNLTAAYNSGRQKYPKGAKFTGWGLLKDFETYNNAKILDLNNT 
I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I : I I I I I I II I I I I I I I I II I I I I I I I 
orf 133-1 NRNYQFNYGLSLNPYTNLNLTAAYNSGRQKYPKGSKFTGWGLLKDFETYNNAKILDLNNT 
330 340 350 360 370 380 

430 440 450 460 470 480 

orfl33ng-l .pep ATFRLPRETELQTTLGFNYFHNEYGKNRFPEELGLFFDGPDQDNGLYSYLGRFKGDKGLL 
I I I I I I I I I I I I I I I I I I M I I II I I I I I I I I I I II II II I I I I I I I I I I I I I I I I I I I I 
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orf 133-1 ATFRLPRETELQTTLGFNYFHNEYGKNRFPEELGLFFDGPDQDNGLYSYLGRFKGDKGLL 
390 400 410 420 430 440 

490 500 510 520 530 540 

orfl33ng-l.pep PQKSTIVQPAGSQYFNTFYFDAALKKDIYRLNYSTNAINYRFGGEYTGYYGSENEFKRAF 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ::: I I I I I I I I I I I I I :: I ! I I I ! 
orf 133-1 PQKSTIVQPAGSQYFNTFYFDAALKKDIYRLNYSTNTVGYRFGGEYTGYYGSDDEFKRAF 
450 460 470 480 490 500 

550 560 570 580 590 600 

orf 133ng-l .pep GENSPAYKEHCDPSCGLYEPVLKKYGKKRANNHSVSISADFGDYFMPFAGYSRTHRMPNI 



610 620 630 640 650 660 

orfl33ng-l .pep QEMYFSQIGDSGVHTALKPERANTWQFGFNTYKKGLLKQDDILGLKLVGYRSRIDNYIHN 
I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I! I i I I I I I I I I I II I M I I I I I 
orf 133-1 QEMYFSQIGDSGVHTALKPERANTWQFGFNTYKKGLLKQDDTLGLKLVGYRSRIDNYIHN 
570 580 590 600 610 620 

670 680 690 700 710 720 

orfl33ng-l .pep VYGKWWDLNGDIPSWVGSTGLAYTIRHRNFKDKVHKHGFELELNYDYGRFFTNLSYAYQK 



730 740 750 760 770 780 

irf!33ng-l.pep STQPTNFSDASESPNNASKEDQLKQGYGLSRVSALPRDYGRLEVGTRWLGNKLTLGGAMR 



790 800 810 820 830 840 

orfl33ng-l .pep YFGKSIRATAEERYIDGTNGGNTSNVRQLGKRSIKQTETLARQPLIFDFYAAYEPKKNLI 
I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I ! I i I I I I I I I I I I I I I I 
orf 133-1 YFGKSIRATAEERYIDGTNGGNTSNFRQLGKRSIKQTETLARQPLIFDFYAAYEPKKNLI 
750 760 770 780 790 800 

850 860 870 880 890 900 

orfl33ng-l .pep FRAEVKNLFDRRYIDPLDAGNDAATQRYYSSFDPKDKDEDVTCNADKTLCNGKYGGTSKS 
I I I I I I I I I I I I! II I I I i I I I I I II I I I I I I M I! I I I I I I I I I I I I I I I I I I I I I I I I 
orf 133-1 FRAEVKNLFDRRYIDPLDAGNDAATQRYYSSFDPKDKDEDVTCNADKTLCNGKYGGTSKS 
810 820 830 840 850 860 



50 



orfl33ng-l.; 
orfl33-l 

In addition, ORF133ng-l is homologous to a TonB-dependent receptor in H. influenzae: 

sp| P45114 | YC17_HAEIN PROBABLE TONB-DEPENDENT RECEPTOR HI1217 PRECURSOR 
>gi I 1075372 | pir | | G64110 transferrin binding protein 1 precursor (tbpl) homolog - 
Haemophilus influenzae (strain Rd KW2 0) >gi 11574147 (U32801) transferrin binding 
protein 1 precursor (tbpl) [Haemophilus influenzae] Length = 913 
Score = 930 bits (2377), Expect = 0.0 

Identities = 476/921 (51%), Positives = 619/921 (66%), Gaps = 72/921 (7%) 

QVLEDVHVKAKRVPKDKKVFTDARAVSTRQDVFKSGENLDNIVRSIPGAFTQQDKSSGIV 97 
+ L + V K + DKK FT+A+A STR++VFK + +D ++RSIPGAFTQQDK SG+V 
ETLGQIDWEKVISNDKKPFTEAKAKSTRENVFKETQTIDQVIRSIPGAFTQQDKGSGW 88 

SLNIRGDSGFGRVNTMVDGITQTFYSTSTDAGRAGGSSQFGASVDSNFIAGLDWKGSFS 157 
S+NIRG++G GRVNTMVDG+TQT FYS T+ D+G++GGSSQFGA++D NFIAG+DV K +FS 
SVNIRGENGLGRVNTMVDGVTQTFYSTALDSGQSGGSSQFGAAIDPNFIAGVDVNKSNFS 14 8 

G SAG INS LAG SANLRT LGVDD WQXXXXXXXXXXXXXXXXXXXXXAMAAIGARKWLE S GA 217 
G++GIN+LAGSAN RTLGV+DV+ M RKWL++G 



60 


Query: 


38 








Sbjct: 


29 




Query: 


98 


65 


Sbjct: 


89 




Query: 


158 
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Sbjct: 


149 


Query: 


218 


Sbjct: 


209 


Query: 


278 


Sbjct: 


266 


Query: 


304 


Sb j ct : 


326 


Query: 


364 


Sbjct: 


385 


Query: 


424 


Sbjct: 


445 


Query: 


482 


Sbjct: 


505 


Query: 


542 


Sbjct: 


556 


Query: 


602 


Sbjct: 


605 


Query: 


662 


Sbjct: 


665 


Query: 


722 


Sbjct: 


723 


Query: 


782 


Sbjct: 


783 


Query: 


842 


Sbjct : 


842 




902 


Sbjct: 


893 



149 GASGINALAGSANFRTLGVNDVITDDKPFGIILKGMTGSNATKSNFMTMAAGRKWLDNGG 208 

IVGVLYGHSRRGVAQNYRVGGGGQHIGNFGEEYLERRKQQYFVQEGGLKFNAGSGKWERD 277 
VGV+YG+S+R V+Q+YR+ GGG+ + + G++ L + K+ YF + G N G+W D 



L +++W 



-TKWY-- 
+Y 



PI+P L+ • 



-KKYEDPQELQK YIEE 303 

KK +D ++LQK IEE 



AQ R L+ +IGSRKI 



■ N Y +LNL AA+N G+ YPKG F GW + T N A I+D+NN+ 



TF LP+E +L+TTLGFNYF NEY KNRFPEEL LF++ 



LLPQ+S I+QP+G Q F T YFD AL K IY LNYS N +Y F GEY GY 



EN+ + + EP+L K G K+A NHS ++SA+ DYFMPF YSRTHRMP 

-ENTAGQQ I NEPILHKSGHKKAFNHSATLSAELSDYFMPFFT YSRTHRMP 604 



NIQEM+FSQ+ ++GV+TALKPE+++T+Q GFNTYKKGL QDD+LG+KLVGYRS I NY I 



+P+W S G YTI H+N+K V K G ELE+NYD GRFF N+SYAY 



Q++ QPTN++DAS PNNAS+ED LKQGYGLSRVS LP+DYGRLE+GTRW 



RY+GKS RAT EE YI+G+ 



++K+TE + +QP+I D ■ 



LI +AEV+NL D+RY+DPLDAGNDAA+QRYYSS 



K+VL NFARGRT+- 



The underlined motif in the gonococcal protein (also present in the meningococcal protein) is 
55 predicted to be an ATP/GTP-binding site motif A (P-loop), and the analysis suggests that these 
proteins from N. meningitidis and N. gonorrhoeae, and their epitopes, could be useful antigens for 
vaccines or diagnostics, or for raising antibodies. 



Example 104 



The following partial DNA sequence was identified in N. meningitidis <SEQ ID 885> 

60 1 ATGAACCTGA TTTCACGTTA CAT CATCCGT CAAATGGCGG TTATGGCGGT 
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51 TTACGCGCTC CTTGCCTTCC TCGCTTTGTA CAGCTTTTTT GAAATCCTGT 

101 ACGAAACCGG CAACCTCGGC AAAGGCAGTT ACGGCATATG GGAAATGCTG 

151 GGCTACACCG CCCTCAAAAT GCCCGCCCGC GCCTACGAAC TGATTCCCCT 

2 01 CGCCGTCCTT ATCGGCGGAC TGGTCTCCCT CAGCCAGCTT GCCGCCGGCA 

251 GCGAACTGAC CGTCATCAAA GCCAGCGGCA TGAGCACCAA AAAGCTGCTG 

301 TTGATTCTGT CGCAGTTCGG TTTTATTTTT GCTATTGCCA CCGTCGCGCT 

351 CGGCGAATGG GTTGCGCCCA CACTGAGCCA AAAAGCCGAA AACATCAAAG 

401 CCGCCGCCAT CAACGGCAAA ATCAGCACCG GCAATACCGG CCTTTGGCTG 

451 AAAGAAAAAA ACAGCGTGAT CAATGTGCGC GAAATGTTGC CCGACCAT . . 

This corresponds to the amino acid sequence <SEQ ID 886; ORF1 12>: 

1 MNLISRYIIR QMAVMAV YAL LAFLALYSFF EILYETGNLG KGSYGIWEML 

51 GYTALKMPAR AYE LI PLAVL IGGLVSLSQL AAGSELTVIK ASGMSTKKLL 

101 LILSQFGFIF AIATVA LGEW VAPTLSQKAE NIKAAAINGK ISTGNTGLWL 

151 KEKNSVINVR EMLPDH. . . 

Further work revealed further partal nucleotide sequence <SEQ ID 887>: 

1 ATGAACCTGA TTTCACGTTA CATCATCCGT CAAATGGCGG TTATGGCGGT 

51 TTACGCGCTC CTTGCCTTCC TCGCTTTGTA CAGCTTTTTT GAAATCCTGT 

101 ACGAAACCGG CAACCTCGGC AAAGGCAGTT ACGGCATATG GGAAATGCTG 

151 gGCTACACCG CCCTCAAAAT GCCCGCCCGC GCCTACGAAC TGATTCCCCT 

2 01 CGCCGTCCTT ATCGGCGGAC TGGTCTCCCT CAGCCAGCTT GCCGCCGGCA 

251 GCGAACTGAC CGTCATCAAA GCCAGCGGCA TGAGCACCAA AAAGCTGCTG 

301 TTGATTCTGT CGCAGTTCGG TTTTATTTTT GCTATTGCCA CCGTCGCGCT 

351 CGGCGAATGG GTTGCGCCCA CACTGAGCCA AAAAGCCGAA AACATCAAAG 

4 01 CCGCCGCCAT CAACGGCAAA ATCAGCACCG GCAATACCGG CCTTTGGCTG 

451 AAAGAAAAAA ACAGCrTkAT CAATGTGCGC GAAATGTTGC CCGACCATAC 

501 GCTTTTGGGC ATCAAAATTT GGGCGCGCAA CGATAAAAAC GAATTGGCAG 

551 AGGCAGTGGA AGCCGATTCC GCCGTTTTGA ACAGCGACGG CAGTTGGCAG 

601 TTGAAAAACA TCCGCCGCAG CACGCTTGGC GAAGACAAAG TCGAGGTCTC 

651 TATTGCGGCT GAAGAAAACT GGCCGATTTC CGTCAAACGC AACCTGATGG 

7 01 ACGTATTGCT CGTCAAACCC GACCAAATGT CCGTCGGCGA ACTGACCACC 
751 TACATCCGCC ACCTCCAAAA CAACAGCCAA AACACCCGAA TCTACGCCAT 

8 01 CGCATGGTGG CGCAAATTGG TTTACCCCGC CGCAGCCTGG GTGATGGCGC 
8 51 TCGTCGCCTT TGCCTTTACC CCGCAAACCA CCCGCCACGG CAATATGGGC 
901 TTAAAACTCT TCGGCGGCAT CTGTsTCGGA TTGCTGTTCC ACCTTGCCGG 
951 ACGGCTCTTT GGGTTTACCA GCCAACTCGG . . . 

This corresponds to the amino acid sequence <SEQ ID 888; ORF1 12-1>: 

1 MNLISRYIIR QMAVMAVYAL LAFLALYSFF EILYETGNLG KGSYGIWEML 

51 GYTALKMPAR A YE LI PLAVL IGGLVSLSQL AAGSELTVIK ASGMSTKKLL 

101 LILSQFGFIF AIATV ALGEW VAPTLSQKAE NIKAAAINGK ISTGNTGLWL 

151 KEKNSXINVR EMLPDHTLLG IKIWARNDKN ELAEAVEADS AVLNSDGSWQ 

201 LKNIRRSTLG EDKVEVSIAA EENWPISVKR NLMDVLLVKP DQMSVGELTT 

251 YIRHLQNNSQ NTRIYAIAWW RK LVYPAAAW VMALVAFAF T PQTTRHGNMG 

301 LKLFGGICXG LLFHL AGRLF GFTSQL... 

Computer analysis of this amino acid sequence predicts two transmembrane domains and gave the 
following results: 

Homology with a predicted QRF from N. meningitidis (strain A) 

ORF1 12 shows 96.4% identity over a 166aa overlap with an ORF (ORF1 12a) from strain A of N. 
meningitidis: 

10 20 30 40 50 60 

or f 112 .pep MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMLGYTALKMPAR 

I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 

o r f 1 1 2 a MNLISRYI IRQMAVMAVYALLAFLAL YS FFE ILYETGNLGKGS YGIWEMXGYTALKMXAR 

10 20 30 40 50 60 



70 80 90 100 110 120 

orfll2.pep AYELIPLAVLIGGLVSLSQLAAGSELTVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 
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I | | | : I I M II II I I I I I I I I I I I I : I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I M I 
AYELMPLAVLIGGLVSXSQLAAGSELXVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 
70 80 90 100 110 120 

130 140 150 160 

VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSVINVREMLPDH 
I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II : I I M I I I I I I 

VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSIINVREMLPDHTLLGIKIWARNDKN 
130 140 150 160 170 180 

ELAEAVEADSAVLNSDGSWQLKNIRRSTLGEDKVEVSIAAEEXWPISVKRNLMDVLLVKP 



190 200 210 

The ORF1 12a nucleotide sequence <SEQ ID 889> is: 



101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
801 
851 
901 
951 
1001 
1051 



ATGAACCTGA 
TTACGCGCTC 
ACGAAACCGG 
GGNTACACCG 
CGCCGTCCTT 
GCGAACTGAN 
TTGATTCTGT 
CGGCGAATGG 
CCGCGGCCAT 
AAAGAAAAAA 
CCTGCTGGGC 
AGGCAGTGGA 
TTGAAAAACA 
TATTGCGGCT 

ACGTATTGCT 

TACATCCGCC 
CGCATGGTGG 
TCGTCGCCTT 
TTAAAANTCT 
NCGGCTCTTC 
NCGGCGCACT 
CGCAAACAGG 



TTTCACGTTA 
CTTGCCTTCC 
CAACCTCGGC 
CCCTCAAAAT 
ATCGGCGGAC 
C GT CAT C AAA 
CGCAGTTCGG 
GTTGCGCCCA 
CAACGGCAAA 
ACAGCATTAT 
ATTAAAATCT 
AGCCGATTCC 
TCCGCCGCAG 
GAAGAAAANT 
CGTCAAACCC 
ACCTCCAAAN 
CGCAAATTGG 
TGCCTTTACC 
TCGGCGGCAT 
NGGTTTACCA 
ACCTACCATA 
AAAAACGCTA 



CATCATCCGT 
TCGCTTTGTA 
AAAGGCAGTT 
GNCCGCCCGC 
TGGTCTCTNT 
GCCAGCGGCA 
TTTTATTTTT 
CACTGAGCCA 
AT CAGT ACCG 
CAATGTGCGC 
GGGCCCGCAA 
GCCGTTTTGA 
CACGCTTGGC 
GGCCGATTTC 
GACCAAATGT 
NNACAGCCAA 
TTTACCCCGC 
CCGCAAACCA 
CTGTCTCGGA 
GCCAACTCTA 
GCCTTCGCCT 



CAAATGGCGG 
CAGCTTTTTT 
ACGGCATATG 
GCCTACGAAC 
CAGCCAGCTT 
TGAGCACCAA 
GCTATTGCCA 
AAAAGCCGAA 
GCAATACCGG 
GAAATGTTGC 
CGATAAAAAC 
ACAGCGACGG 
GAAGACAAAG 
CGTCAAACGC 
CCGTCGGCGA 
AACACCCGAA 
CGCAGCCTGG 
CCCGCCACGG 
TTGCTGTTCC 
CGGCATCCCG 
TGCTCGCCGT 



TTATGGCGGT 
GAAATCCTGT 
GGAAATGNTG 
TGATGCCCCT 
GCCGCCGGCA 
AAAGCTGCTG 
CCGTCGCGCT 
AACATCAAAG 
CCTTTGGCTG 
CCGACCATAC 
GAACTGGCAG 
CAGTTGGCAG 
TCGAGGTCTC 
AACCTGATGG 
ACTGACCACC 
TCTACGCCAT 
GTGATGGCGC 
CAATATGGGC 
ACCTTGCCGG 
CCCTTCCTCG 
TTGGCTGATA 



This encodes a protein having the amino acid sequence <SEQ ID 890>: 

1 MNLISRYIIR QMAVMAVYAL LAFLALYS FF EILYETGNLG KGSYGIWEMX 



101 
151 
201 
251 
301 
351 



GYTALKMXAR A YE LMPLAVL IGGLVSXSQ L AAGSELXVIK 
LILSQFGFIF AIATV ALGEW VAPTLSQKAE NIKAAAINGK 
KEKNSIINVR EMLPDHTLLG IKIWARNDKN ELAEAVEADS 
LKNIRRSTLG EDKVEVSIAA EEXWPISVKR NLMDVLLVKP 
YIRHLQXXSQ NTRIYAIAWW R KLVYPAAAW VMALVAFAF T 



PFLXGALPTI 



ASGMSTKKLL 
ISTGNTGLWL 
AVLNSDGSWQ 
DQMSVGELTT 
PQTTRHGNMG 
AFALLAVWLI 



45 ORF1 12a and ORF1 12-1 show 96.3% identity in 326 aa overlap: 



orf 112a . pep MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMXGYTALKMXAR 
II I I II II I I I I I I I M I I I I I I I I I I I I II I I I I I I I I I I I I I I 1 I I I I I I I I I I || 
or f 1 12 - 1 MNLI SRYI IRQMAVMAVYALLAFLALYS FFEILYETGNLGKGS YGIWEMLGYTALKMPAR 



orfll2a.pep 



AYELMPLAVLIGGLVSXSQLAAGSELXVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 

1111:111 I I I I I I I I I I = I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 

rf 112-1 AYELIPLAVLIGGLVSLSQLAAGSELTVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 

rf 112a. pep VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSIINVREMLPDHTLLGIKIWARNDKN 
I I I I I II M I II I I I I I I I I I I I I I I I I I I I 1 I I I I I II I I II II II I I I I I I I M II I 
rf 112-1 VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSXINVREMLPDHTLLGIKIWARNDKN 

rf 112a . pep ELAEAVEADSAVLNSDGSWQLECNIRRSTLGEDKVEVSIAAEEXWPISVKRNLMDVLLVKP 
I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I 
rf 112-1 ELAEAVEADSAVLNSDGSWQLKNIRRSTLGEDKVEVSIAAEENWPISVKRNLMDVLLVKP 



orfll2a.pep 



DQMSVGELTTYIRHLQXXSQNTRIYAIAWWRKLVYPAAAWVMALVAFAFTPQTTRHGNMG 
I I I I I I I I I I I I I I I I I I II I I [ M I I I I I 1 I I I I I I I I I I I I | | | | | | 
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DQMSVGELTTYIRHLQNNSQNTRIYAIAWWRKLVYPAAAWVMALVAFAFTPQTTRHGNMG 

LKXFGGICLGLLFHLAGRLFXFTSQLYGIPPFLXGALPTIAFALLAVWLIRKQEKRX 

II I I I I I I I I I I I I I I II I IE I I 

LKLFGGICXGLLFHLAGRLFGFTSQL 



Homology with a predicted ORF from N. gonorrhoeae 

ORF112 shows 95.8% identity over 166aa overlap with a predicted ORF (ORF112ng) fromiV. 
gonorrhoeae: 

orfll2 .pep MNL I SRY I IRQMAVMAVYALLAFL AL YS FFE I LYE TGNLGKGS YG I WEMLGYT ALKMPAR 60 

I M M I I I I I I I I I M I I I I I I I I M II I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I 
orfll2ng MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMLGYTALKMPAR 60 

orf 112 .pep AYELIPLAVLIGGLVSLSQLAAGSELTVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 12 0 

I I I I : I I I I I I I I I : I I I I I II I I I I : II I I I I I I II I I I I I I I I I I I I I I I I : I I I I I I 
orfll2ng AYELMPLAVLIGGLASLSQLAAGSELAVIKASGMSTKKLLLILSQFGFIFAIAAVALGEW 120 

orf 112. pep VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKNSVINVREMLPDH 166 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I : I : I I I I I I I I I 
orfll2ng VAPTLSQKAENI KAAAINGKI S TGNT GLWLKEKT S I INVRGMLPDHT LLGI KIWARNDKN 18 0 

The complete length ORF1 12ng nucleotide sequence <SEQ ID 891 > is: 



1 ATGAACCTGA TTTCACGTTA CATCATCCGC CAAATGGCGG TTATGGCGGT 

51 TTACGCGCTC CTTGCCTTCC TCGCTTTGTA CAGCTTTTTT GAAATCCTGT 

101 ACGAAACCGG CAACCTCGGC AAAGGCAGTT ACGGCATATG GGAAATGCTG 

151 GGCTACACCG CCCTCAAAAT GCCCGCCCGC GCCTACGAAC TCATGCCCCT 

2 01 CGCCGTCCTC ATCGGCGGAC TGGCCTCTCT CAGCCAGCTT GCCGCCGGCA 

251 GCGAACTGGC CGTCATCAAA GCCAGCGGCA TGAGCACCAA AAAGCTGCTG 

301 TTGATTCTGT CTCAGTTCGG TTTTATTTTT GCTATTGCCG CCGTCGCGCT 

351 CGGCGAATGG GTTGCGCCCA CGCTGAGCCA AAAAGCCGAA AACATCAAag 

4 01 cCGCCGCCAt taacggCAAA ATCAGCAccg gcAATACCGG CCTTTggcTG 

451 AAAGAAAAAa ccAGCATTAT CAATGTGcGc GGAATGTTGC CCGACCATAC 

501 GCTTTTGGGC ATCAAAATTT GGGCGCGCAA CGATAAAAAC GAATTGGCAG 

551 AGGCAGTGGA AGCCGATTCC GCCGTTTTGA ACAGCGACGG CAGCTGGCAG 

601 T T GAAAAAC A TCCGCCGCAG CATCATGGGT ACAGACAAAA TCGAAACATC 

651 cgCCGCCGCC GAAGAAACTT gGCCGATTGC CGTCAGACGC AACCTGATGG 

701 ACGTATTGCT CGTCAAGCCC GACCAAATGT CCGTCGGCGA GCTGACCACC 

751 TACATCCGCC ACCTCCAAAA CAACAGCCAA AACACCCAAA TCTACGCCAT 

8 01 CGCATGGTGG CGTAAACTCG TTTACCCCGT CGCCGCATGG GTCATGGCGC 

851 TCGTTGCCTT CGCCTTTACG CCGCAAACCA CGCGCCACGG CAATATGGGC 

901 TTAAAACTCT TCGGCGGCAT CTGTCTCGGA TTGCTGTTCC ACCTTGCCGG 

951 CAGGCTCTTC GGGTTTACCA GCCAACTCTA CGGCACCCCA CCCTTCCTCG 

10 01 CCGGCGCACT GCCTACCATA GCCTTCGCCT TGCTCGCTGT TTGGCTGATA 

1051 CGCAAACAGG AAAAACGTTG A 

This encodes a protein having amino acid sequence <SEQ ID 892>: 



1 MNLISRYIIR QMAVMAVYAL LAFLALYSFF EILYETGNLG KGSYGIWEML 

51 GYTALKMPAR A YE LMPLAVL IGGLASLSQL AAGSELAVIK ASGMSTKKLL 

101 LILSQFGFIF AIAAVA LGEW VAPTLSQKAE NIKAAAINGK ISTGNTGLWL 

151 KEKTSIINVR GMLPDHTLLG I KIWARNDKN E LAE AVE AD S AVLNSDGSWQ 

201 LKNIRRSIMG TDKIETSAAA EETWPIAVRR NLMDVLLVKP DQMSVGELTT 

251 YIRHLQNNSQ NTQIYAIAWW RK LVYPVAAW VMALVAFAF T PQTTRHGNMG 

3 01 LKLFGGICLG LLFHL AGRLF GFTSQLYGTP PFL AGALPTI AFALLAVWLI 

351 RKQEKR* 

ORF112ng and ORF 11 2-1 show 94.2% identity in 326 aa overlap: 



MNL I SRYIIRQMAVMAVYALLAFLALYS FFE ILYETGNLGKGSYGIWEMLGYT ALKMPAR 
I II II I i I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 1 I I I I I I I I 
MNLISRYIIRQMAVMAVYALLAFLALYSFFEILYETGNLGKGSYGIWEMLGYT ALKMPAR 



70 



90 100 110 120 
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AYELMPLAVLIGGLASLSQLAAGSELAVIKASGMSTKKLLLILSQFGFIFAIAAVALGEW 
| | | I : I I I I 1 I I I I : I I I I I I I I I I I : I I I I I I I I I ! I I I M I I 1 I I I I I I I I : M I I I I 
AYELIPLAVLIGGLVSLSQLAAGSELTVIKASGMSTKKLLLILSQFGFIFAIATVALGEW 
70 80 90 100 110 120 

130 140 150 160 170 180 

VAPTLSQKAENIKAAAINGKISTGNTGLWLKEKTSIINVRGMLPDHTLLGIKIWARNDKN 



130 140 150 160 170 180 

190 200 210 220 230 240 

ELAEAVEADSAVLNSDGSWQLKNIRRSIMGTDKIETSAAAEETWPIAVRRNLMDVLLVKP 
I I I I I I I I I I I I I I I I I I I I I I I I I I I •- I I I : I : I I 1 I I : 11 I : I : I I I I I I I I I I I 
ELAEAVEADSAVLNSDGSWQLKNIRRSTLGEDKVEVSIAAEENWPISVKRNLMDVLLVKP 

190 200 210 220 230 240 

250 260 270 280 290 300 

DQMSVGELTTYIRHLQNNSQNTQIYAIAWWRKLVYPVAAWVMALVAFAFTPQTTRHGNMG 
I I I I I I I I I I I I I I M I I 1 I I I : 1 I I I I I I I 1 I I I 1 : I 1 I M I I I I I I I I I I I M I I I I I 
DQMSVGELTTYIRHLQNNSQNTRIYAIAWWRKLVYPAAAWVMALVAFAFTPQTTRHGNMG 
250 260 270 280 290 300 

310 320 330 340 350 

LKLFGGICLGLLFHLAGRLFGFTSQLYGTPPFLAGALPTIAFALLAVWLIRKQEKRX 
I I I I I I I I I I I I I I I I I I I I I I I I I 
LKLFGGICXGLLFHLAGRLFGFTSQL 

310 320 



30 This analysis suggests that these proteins from N. meningitidis and N. gonorrhoeae, and their 
epitopes, could be useful antigens for vaccines or diagnostics, or for raising antibodies. 



Example 105 



Table III lists several Neisseria strains which were used to assess the conservation of the sequence 
of ORF 4 among different strains. 



35 TABLE III - List of Neisseria Strains Used for Gene Variability Study of ORF 4 



ORF4 gene variability: List of used Neisseria strains 


Identification Strains 


Source / reference 


number 








Group B 




zv01_4 


NG6/88 


R. Moxon / Seiler et al, 1996 


zv02_4 


BZ198 


R. Moxon / Seiler et al, 1996 


zv03_4ass 


NG3/88 


R. Moxon / Seiler et al, 1996 


ZV 04_4 


297-0 


R. Moxon / Seiler et al., 1996 


zv05_4 


1000 


R. Moxon / Seiler et al, 1996 


zv06_4 


BZ147 


R. Moxon / Seiler et al, 1996 


zv07_4 


BZ169 


R. Moxon / Seiler et al, 1996 


zv08_4 


528 


R. Moxon / Seiler et al, 1996 


zv09_4 


NGP165 


R. Moxon / Seiler et al, 1996 


zvl0_4 


BZ133 


R. Moxon / Seiler et al, 1996 



orfll2ng 
orfll2-l 

5 

orfll2ng 
orfll2-l 

10 

orfll2ng 
15 orfll2-l 

orfll2ng 

20 

orfll2-l 

25 orfll2ng 
orfll2-l 
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zvl 1_4 


NGE31 


R. Moxon / Seiler et al, 1996 


zvl2_4ass 


NGF26 


K. Moxon / seller et al., lyyo 


zvl3_4 


NGE28 


D HjTni/Ati / Coil or nl 1 QQ£\ 

K. ivioxon / oeiier et ai., ivyo 


zvl5_4 


SWZ107 


K. JvLoxon / seller et ai., lyyo 


zvl 6 4 


NGH15 


R. Moxon / Seiler ef a/., 1996 


zvl7_4 


NGH36 


R. Moxon / Seiler et al, 1996 


zvl 8 4 


BZ232 




zvl9_4 


BZ83 


R. Moxon / Seiler et al., 1996 


zv20_4 


44/76 


R. Moxon / Seiler et al, 1996 


zv21_4 


MC58 


R. Moxon 


zv96_4 


2996 


Our collection 




Group A 




zv22_4 


205900 


R. Moxon 


z2491_4 


Z2491 


R. Moxon / Maiden et al., 1998 




Group C 




zv24_4 


90/18311 


R. Moxon 


zv25_4 


93/4286 


R. Moxon 




Others 




zv26 4ass 


A22 (group W) 


R. Moxon / Maiden et al., 1998 


zv27_4 


E26 (group X) 


R. Moxon / Maiden et al, 1998 


zv28_4 


860800 (group Y) 


R. Moxon / Maiden et al, 1998 


zv29_4 


E32 (group Z) 


R. Moxon / Maiden et al, 1998 




Gonococcus 




zv32_4 


Ng F62 


R. Moxon / Maiden et al, 1998 


zv33_4 


Ng SN4 


R. Moxon 


fal090_4 


FA1090 


R. Moxon 


References: 






Seiler A. etal, Mol. Microbiol., 1996, 19(4):841-856. 


Maiden et al. 


Proc. Natl. Acad. Sci. USA, 1998, 95:3140-3145. 



The amino acid sequences for each listed strain are as follows: 

>FA1090_4 <SEQ ID 893> 

MKTFFKTLSAAALALILAACGGQKDSAPAASAAAPSADNGAAKKEIVFGTTVGDFGDWK 
EQIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEAF 
QVPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARALVMLNELGWIKLKDGINPLTAS 
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KADIAENLKHIKIVELEAAQLPRSRADVDFAVOTGNYAISSGMKLTEALFQEPSFAYVNW 
SAVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKYPAAWNEGAAK* 

>Z2491_4 <SEQ ID 894> 
5 MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAAKKEIVFGTTVGDFGDMVKE 
QIQPELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV01_4 <SEQ ID 895> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAAKKEIVFGTTVGDFGDMVKE 
QIQAELEKKGYTVKIVEFTDYVRPULALAEGELDINVFQHKPYLDDFKKEHHLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
15 ADIAENLKWIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKS PAAWNEGAAK* 

>ZV02_4 <SEQ ID 896> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAEKKEIVFGTTVGDFGDMVKE 
20 HIQPELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAARNEGAAK* 

25 >ZV03_4ASS <SEQ ID 897> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAEKKEIVFGTTVGDFGDMVKE 
HIQPELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 

30 AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV04_4 <SEQ ID 898> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAEKKEIVFGTTVGDFGDMVKE 
HIQPELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
35 VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGHYAISSGMKLTEALFQEPSFAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV05_4 <SEQ ID 899> 

40 MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAEKKEIVFGTTVGDFGDMVKE 
HIQPELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 

^ AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV06_4 <SEQ ID 900> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAEKKEIVFGTTVGDFGDMVKE 
QIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
50 ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 
AVKTAHKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV07_4 <SEQ ID 901> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAAKKEIVFGTTVGDFGDMVKE 
55 QIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

60 >ZV08_4 <SEQ ID 902> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAEKKEIVFGTTVGDFGDMVKE 
HIQPELEKKGYTVELVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHHLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 

65 AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV09_4 <SEQ ID 902> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAEKKEIVFGTTVGDFGDMVKE 
HIQPELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
70 VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV10_4 <SEQ ID 903> 
75 MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAAKKEIVFGTTVGDFGDMVKE 
HIQPE1EKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADIAEKLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 
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>ZV11 4 <SEQ ID 904> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAAKKEIVFGTTVGDFGDMVKE 
QIQVELEKKGYTVKLVEFT DYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
5 VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAVWGNYAISSGMKLTEALFQEPSFAYVHWS 
AVKTADKDS QWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV12_4ASS <SEQ ID 905> 

1 0 MKTFFKTLSAAALALILAACGGQKDRAPAASASAASENGAAKKEILFGTTVGDLGDMVKE 
QIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 

^ ^ AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV13_4 <SEQ ID 906> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAAKKEIVFGTTVGDFGDMVKE 
QIQPELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
20 ADIAENLKNIKIVELEAAQLPRSRADVDFAWKGNYAISSGMKLTEALFQEPSFAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV15_4 <SEQ ID 907> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAEKKEIVFGTTVGDFGDMVKE 
25 HIQPELEKKGYTVKLVEFTDYVRPHLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAGNEGAAK* 

30 >ZV16_4 <SEQ ID 908> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAEKKEIVFGTTVGDFGDMVKE 
HIQPELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADIAENLKKIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 

35 AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV17_4 <SEQ ID 909> 

MKT FFKTLS AAALAL I LAACGGQKDSAPAASASAAADNGAEKKEI VFGTTVGDFGDMVKE 
QIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
40 VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSN FAR VLVMLDELGWIKLKDGINPL TASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAAAraGNYAISSGMKLTEALFQEPSFAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV18_4 <SEQ ID 910> 

45 MKT FFKTLS AAALAL I LAACGGQKDSAPAASASAAADNGAEKKEIVFGTTVGDFGDMVKE 
HIQPELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADIAEHLKMIKIVELEAAQLPRSRADVDFAVVNGNYAISSGMKLTEALFQEPSFAYVNWS 

^ ^ AVKTADKDS QWLKDVTEAYNS DAFKAYAHKRFEG YKS PAAWNE GAAK* 

>ZV19_4 <SEQ ID 911> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAAKKEIVFGTTVGDFGDMVKE 
QIQAELEKKGYTVELVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
55 ADIAENLKKIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV20_4 <SEQ ID 912> 

MKTFFKTLS AAALAL ILAACGGQKDSAPAASASAAADNGAAKKEIVFGTTVGDFGDMVKE 
60 QIQAELEKKGYTVELVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

65 >ZV21_4 <SEQ ID 913> 

MKTFFKTLS AAALA1ILAACGGQKDSAPAASASAAADNGAAKKEI VFGTTVGDFGDMVKE 
QIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAVVNGNYAISSGMKLTEALFQEPSFAYVNWS 

70 AVKTADKDS QWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV22_4 <SEQ ID 914> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAAKKEIVFGTTVGDFGDLVKE 
QIQPELEKKGYTVELVEFTDYVRPNLALGEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
75 VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 



>ZV24_4ASS <SEQ ID 915> 
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MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAEKKEIVFGTTVGDFGDMVKE 
HIQPELEKKGYTVELVEFTDDVRPNLALGEGELDIIVFQHKPYLDDFKKEQNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKS PAAWNEGAAK* 

>ZV25_4 <SEQ ID 916> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAEKKEIVFGTTVGDFGDMVKE 
QIQPELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARALVMLDELGWIKLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKS PAAWNEGAAK* 

>ZV26_4 <SEQ ID 917> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAEKKEIVFGTTVGDFGDMVKE 
HIQPELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAVWGNYAISSGMKLTEALFQEPSFAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV27_4 <SEQ ID 918> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAAKKEIVFGTTVGDFGDMVKE 
QI QPELEKKGYTVKLVE FTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDI TEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAVWGNYAISSGMKLTEALFQEPSFAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 

>ZV28_4 <SEQ ID 919> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAEKKEIVFGTTVGDFGDMVKE 
HI QPELEKKGYTVKLVE FTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHHLDI TEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYWWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKS PAAWNEGAAK* 

>ZV29_4 <SEQ ID 920> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAAKKEIVFGTTVGDFGDMVKE 
QIQVELEKKGYTVKLVE FTDYVRPNLALAEGELDIHVFQHKPYLDDFKKEHHLDI TEVFQ 
VPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARVLVMLDELGWIKLKDGINPLTASK 
ADIAENLKNIKIVELEAAQLPRSRADVDFAWHGNYAISSGMKLTEALFQEPSFAYVNWS 
AVKTADKD S QWLKDVTEAYN SDAFKAYAHKRFEGYKS PAAWNE GAAK* 

>ZV32_4 <SEQ ID 921> 

MKTFFKTLSAAALALILAACGGQKDSAPAASAAAPSADNGAAKKEIVFGTTVGDFGDMVK 
EQIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEAF 
QVPTAPLGLYPGKLKSLEEVKDGSTVSAPNDPSNFARALVMLNELGWIKLKDGINPLTAS 
KADI AENLKNIKIVELEAAQLPRS RAD VDFAWNGKYAISSGMKLTEALFQEPSFAYVNW 
SAVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKYPAAWNEGAAK* 

>ZV33_4 <SEQ ID 922> 

MKTFFKTLSAAALALILAACGGQKDSAPAASAAAPSADNGAAKKEIVFGTTVGDFGDMVK 
EQIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEAF 
QVPTAPLGLYPGKLKSLEEVKDGSTVSAPKDPSNFARALVMLNELGWIKLKDGINPLTAS 
KADIAENLKNIKIVELEAAQLPRS RADVDFAWNGKYAISSGMKLTEALFQEPSFAYVNW 
SAVKTADKDSQWLKDVTEAYWSDAFKAYAHKRFEGYKYPAAWNEGAAK* 

>ZV96_4 <SEQ ID 923> 

MKTFFKTLSAAALALILAACGGQKDSAPAASASAAADNGAEKKEIVFGTTVGDFGDMVKE 
QIQAELEKKGYTVKLVEFTDYVRPNLALAEGELDINVFQHKPYLDDFKKEHNLDITEVFQ 



60 ADIAENLKNIKIVELEAAQLPRSRADVDFAWNGNYAISSGMKLTEALFQEPSFAYVNWS 
AVKTADKDSQWLKDVTEAYNSDAFKAYAHKRFEGYKSPAAWNEGAAK* 



Figure 8 shows the results of aligning the sequences of each of these strains. Dark shading 
indicates regions of homology, and gray shading indicates the conservation of amino acids with 
similar characteristics. As is readily discernible, there is significant conservation among the 
various strains of ORF 4, further confirming its utility as an antigen for both vaccines and 
diagnostics. 
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It will be appreciated that the invention has been described by means of example only, and that 
modifications may be made whilst remaining within the spirit and scope of the invention. 
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CLAIMS 

1 . A protein comprising an amino acid sequence selected from the group consisting of SEQ 
IDs 2, 4, 6, and 8. 

2. A nucleic acid molecule which encodes a protein according to claim 1 . 

5 3. A nucleic acid molecule according to claim 2, comprising a nucleotide sequence selected 
from the group consisting of SEQ IDs 1, 3, 5, and 7. 

4. A protein comprising an amino acid sequence selected from the group consisting of SEQ 
IDs 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 
54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 100, 102, 

10 104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 136, 138, 140, 142, 
144, 146, 148, 150, 152, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 176, 178, 180, 182, 
184, 186, 188, 190, 192, 194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 216, 218, 220, 222, 
224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 256, 258, 260, 262, 
264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 296, 298, 300, 302, 

15 304, 306, 308, 310, 312, 314, 316, 318, 320, 322, 324, 326, 328, 330, 332, 334, 336, 338, 340, 342, 
344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 376, 378, 380, 382, 
384, 386, 388, 390, 392, 394, 396, 398, 400, 402, 404, 406, 408, 410, 412, 414, 416, 418, 420, 422, 
424, 426, 428, 430, 432, 434, 436, 438, 440, 442, 444, 446, 448, 450, 452, 454, 456, 458, 460, 462, 
464, 466, 468, 470, 472, 474, 476, 478, 480, 482, 484, 486, 488, 490, 492, 494, 496, 498, 500, 502, 

20 504, 506, 508, 510, 512, 514, 516, 518, 520, 522, 524, 526, 528, 530, 532, 534, 536, 538, 540, 542, 
544, 546, 548, 550, 552, 554, 556, 558, 560, 562, 564, 566, 568, 570, 572, 574, 576, 578, 580, 582, 
584, 586, 588, 590, 592, 594, 596, 598, 600, 602, 604, 606, 608, 610, 612, 614, 616, 618, 620, 622, 
624, 626, 628, 630, 632, 634, 636, 638, 640, 642, 644, 646, 648, 650, 652, 654, 656, 658, 660, 662, 
664, 666, 668, 670, 672, 674, 676, 678, 680, 682, 684, 686, 688, 690, 692, 694, 696, 698, 700, 702, 

25 704, 706, 708, 710, 712, 714, 716, 718, 720, 722, 724, 726, 728, 730, 732, 734, 736, 738, 740, 742, 
744, 746, 748, 750, 752, 754, 756, 758, 760, 762, 764, 766, 768, 770, 772, 774, 776, 778, 780, 782, 
784, 786, 788, 790, 792, 794, 796, 798, 800, 802, 804, 806, 808, 810, 812, 814, 816, 818, 820, 822, 
824, 826, 828, 830, 832, 834, 836, 838, 840, 842, 844, 846, 848, 850, 852, 854, 856, 858, 860, 862, 
864, 866, 868, 870, 872, 874, 876, 878, 880, 882, 884, 886, 888, 890, & 892.. 



30 5. A protein having 50% or greater sequence identity to a protein according to claim 4. 
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6. A protein comprising a fragment of an amino acid sequence selected from the group 
consisting of SEQ IDs 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 
44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80, 82, 84, 86, 88, 90, 92, 94, 
96, 98, 100, 102, 104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126, 128, 130, 132, 134, 

5 136, 138, 140, 142, 144, 146, 148, 150, 152, 154, 156, 158, 160, 162, 164, 166, 168, 170, 172, 174, 
176, 178, 180, 182, 184, 186, 188, 190, 192, 194, 196, 198, 200, 202, 204, 206, 208, 210, 212, 214, 
216, 218, 220, 222, 224, 226, 228, 230, 232, 234, 236, 238, 240, 242, 244, 246, 248, 250, 252, 254, 
256, 258, 260, 262, 264, 266, 268, 270, 272, 274, 276, 278, 280, 282, 284, 286, 288, 290, 292, 294, 
296, 298, 300, 302, 304, 306, 308, 310, 312, 314, 316, 318, 320, 322, 324, 326, 328, 330, 332, 334, 

10 336, 338, 340, 342, 344, 346, 348, 350, 352, 354, 356, 358, 360, 362, 364, 366, 368, 370, 372, 374, 
376, 378, 380, 382, 384, 386, 388, 390, 392, 394, 396, 398, 400, 402, 404, 406, 408, 410, 412, 414, 
416, 418, 420, 422, 424, 426, 428, 430, 432, 434, 436, 438, 440, 442, 444, 446, 448, 450, 452, 454, 
456, 458, 460, 462, 464, 466, 468, 470, 472, 474, 476, 478, 480, 482, 484, 486, 488, 490, 492, 494, 
496, 498, 500, 502, 504, 506, 508, 510, 512, 514, 516, 518, 520, 522, 524, 526, 528, 530, 532, 534, 

15 536, 538, 540, 542, 544, 546, 548, 550, 552, 554, 556, 558, 560, 562, 564, 566, 568, 570, 572, 574, 
576, 578, 580, 582, 584, 586, 588, 590, 592, 594, 596, 598, 600, 602, 604, 606, 608, 610, 612, 614, 
616, 618, 620, 622, 624, 626, 628, 630, 632, 634, 636, 638, 640, 642, 644, 646, 648, 650, 652, 654, 
656, 658, 660, 662, 664, 666, 668, 670, 672, 674, 676, 678, 680, 682, 684, 686, 688, 690, 692, 694, 
696, 698, 700, 702, 704, 706, 708, 710, 712, 714, 716, 718, 720, 722, 724, 726, 728, 730, 732, 734, 

20 736, 738, 740, 742, 744, 746, 748, 750, 752, 754, 756, 758, 760, 762, 764, 766, 768, 770, 772, 774, 
776, 778, 780, 782, 784, 786, 788, 790, 792, 794, 796, 798, 800, 802, 804, 806, 808, 810, 812, 814, 
816, 818, 820, 822, 824, 826, 828, 830, 832, 834, 836, 838, 840, 842, 844, 846, 848, 850, 852, 854, 
856, 858, 860, 862, 864, 866, 868, 870, 872, 874, 876, 878, 880, 882, 884, 886, 888, 890, & 892.. 

7. An antibody which binds to a protein according to any one of claims 4 to 6. 

25 8. A nucleic acid molecule which encodes a protein according to any one of claims 4 to 6. 

9. A nucleic acid molecule according to claim 8, comprising a nucleotide sequence selected 
from the group consisting of SEQ IDs 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 
37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 
89, 91, 93, 95, 97, 99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119, 121, 123, 125, 127, 129, 
30 131, 133, 135, 137, 139, 141, 143, 145, 147, 149, 151, 153, 155, 157, 159, 161, 163, 165, 167, 169, 
171, 173, 175, 177, 179, 181, 183, 185, 187, 189, 191, 193, 195, 197, 199, 201, 203, 205, 207, 209, 
211, 213, 215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 
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251, 253, 255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 
291, 293, 295, 297, 299, 301, 303, 305, 307, 309, 311, 313, 315, 317, 319, 321, 323, 325, 327, 329, 
331, 333, 335, 337, 339, 341, 343, 345, 347, 349, 351, 353, 355, 357, 359, 361, 363, 365, 367, 369, 
371, 373, 375, 377, 379, 381, 383, 385, 387, 389, 391, 393, 395, 397, 399, 401, 403, 405, 407, 409, 
5 411, 413, 415, 417, 419, 421, 423, 425, 427, 429, 431, 433, 435, 437, 439, 441, 443, 445, 447, 449, 
451, 453, 455, 457, 459, 461, 463, 465, 467, 469, 471, 473, 475, 477, 479, 481, 483, 485, 487, 489, 
491, 493, 495, 497, 499, 501, 503, 505, 507, 509, 511, 513, 515, 517, 519, 521, 523, 525, 527, 529, 
531, 533, 535, 537, 539, 541, 543, 545, 547, 549, 551, 553, 555, 557, 559, 561, 563, 565, 567, 569, 
571, 573, 575, 577, 579, 581, 583, 585, 587, 589, 591, 593, 595, 597, 599, 601, 603, 605, 607, 609, 

10 611, 613, 615, 617, 619, 621, 623, 625, 627, 629, 631, 633, 635, 637, 639, 641, 643, 645, 647, 649, 
651, 653, 655, 657, 659, 661, 663, 665, 667, 669, 671, 673, 675, 677, 679, 681, 683, 685, 687, 689, 
691, 693, 695, 697, 699, 701, 703, 705, 707, 709, 711, 713, 715, 717, 719, 721, 723, 725, 727, 729, 
731, 733, 735, 737, 739, 741, 743, 745, 747, 749, 751, 753, 755, 757, 759, 761, 763, 765, 767, 769, 
771, 773, 775, 777, 779, 781, 783, 785, 787, 789, 791, 793, 795, 797, 799, 801, 803, 805, 807, 809, 

15 811, 813, 815, 817, 819, 821, 823, 825, 827, 829, 831, 833, 835, 837, 839, 841, 843, 845, 847, 849, 
851, 853, 855, 857, 859, 861, 863, 865, 867, 869, 871, 873, 875, 877, 879, 881, 883, 885, 887, 889, 
& 891.. 

10. A nucleic acid molecule comprising a fragment of a nucleotide sequence selected from the 
group consisting of SEQ IDs 1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 

20 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 
93, 95, 97, 99, 101, 103, 105, 107, 109, 111, 113, 115, 117, 119, 121, 123, 125, 127, 129, 131, 133, 
135, 137, 139, 141, 143, 145, 147, 149, 151, 153, 155, 157, 159, 161, 163, 165, 167, 169, 171, 173, 
175, 177, 179, 181, 183, 185, 187, 189, 191, 193, 195, 197, 199, 201, 203, 205, 207, 209, 211, 213, 
215, 217, 219, 221, 223, 225, 227, 229, 231, 233, 235, 237, 239, 241, 243, 245, 247, 249, 251, 253, 

25 255, 257, 259, 261, 263, 265, 267, 269, 271, 273, 275, 277, 279, 281, 283, 285, 287, 289, 291, 293, 
295, 297, 299, 301, 303, 305, 307, 309, 311, 313, 315, 317, 319, 321, 323, 325, 327, 329, 331, 333, 
335, 337, 339, 341, 343, 345, 347, 349, 351, 353, 355, 357, 359, 361, 363, 365, 367, 369, 371, 373, 
375, 377, 379, 381, 383, 385, 387, 389, 391, 393, 395, 397, 399, 401, 403, 405, 407, 409, 411, 413, 
415, 417, 419, 421, 423, 425, 427, 429, 431, 433, 435, 437, 439, 441, 443, 445, 447, 449, 451, 453, 

30 455, 457, 459, 461, 463, 465, 467, 469, 471, 473, 475, 477, 479, 481, 483, 485, 487, 489, 491, 493, 
495, 497, 499, 501, 503, 505, 507, 509, 511, 513, 515, 517, 519, 521, 523, 525, 527, 529, 531, 533, 
535, 537, 539, 541, 543, 545, 547, 549, 551, 553, 555, 557, 559, 561, 563, 565, 567, 569, 571, 573, 
575, 577, 579, 581, 583, 585, 587, 589, 591, 593, 595, 597, 599, 601, 603, 605, 607, 609, 611, 613, 
615, 617, 619, 621, 623, 625, 627, 629, 631, 633, 635, 637, 639, 641, 643, 645, 647, 649, 651, 653, 
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655, 657, 659, 661, 663, 665, 667, 669, 671, 673, 675, 677, 679, 681, 683, 685, 687, 689, 691, 693, 
695, 697, 699, 701, 703, 705, 707, 709, 711, 713, 715, 717, 719, 721, 723, 725, 727, 729, 731, 733, 
735, 737, 739, 741, 743, 745, 747, 749, 751, 753, 755, 757, 759, 761, 763, 765, 767, 769, 771, 773, 
775, 777, 779, 781, 783, 785, 787, 789, 791, 793, 795, 797, 799, 801, 803, 805, 807, 809, 811, 813, 
5 815, 817, 819, 821, 823, 825, 827, 829, 831, 833, 835, 837, 839, 841, 843, 845, 847, 849, 851, 853, 
855, 857, 859, 861, 863, 865, 867, 869, 871, 873, 875, 877, 879, 881, 883, 885, 887, 889, & 891.. 

11. A nucleic acid molecule comprising a nucleotide sequence complementary to a nucleic acid 
molecule according to any one of claims 8 to 10. 

12. A nucleic acid molecule comprising a nucleotide sequences having 50% or greater sequence 
10 identity to a nucleic acid molecule according to any one of claims 8-11. 

13. A nucleic acid molecule which can hybridise to a nucleic acid molecule according to any 
one of claims 8-12 under high stringency conditions. 

14. A composition comprising a protein, a nucleic acid molecule, or an antibody according to 
any preceding claim. 

15 15. A composition according to claim 14 being a vaccine composition or a diagnostic 
composition. 

16. A composition according to claim 14 or claim 15 for use as a pharmaceutical. 

17. The use of a composition according to claim 14 in the manufacture of a medicament for the 
treatment or prevention of infection due to Neisserial bacteria. 
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ABSTRACT 

The invention provides proteins from Neisseria meningitidis (strains A & B) and from Neisseria 
gonorrhoeae, including amino acid sequences, the corresponding nucleotide sequences, expression 
data, and serological data. The proteins are useful antigens for vaccines, immunogenic 
compositions, and/or diagnostics. 
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IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 



In Re Application of: 

Enzo Scalato, Vega Masignani, Rino Rappuoli, 

Mariagrazia Pizza,, and Guido Grandi Group Art Unit: not assigned 

Examiner: not assigned 

For: Neisserial Antigens 



DECLARATION AND POWER OF ATTORNEY 



As a below named inventor, I hereby declare that: 

My residence, post office address and citizenship are as stated below next to my name; and 

I believe that I am the original, first and sole inventor (if only one name is listed below) or an 
original, first and joint inventor (if plural names are listed below) of the subject matter which 
is claimed and for which a 

13 Utility Patent D Design Patent 

is sought on the invention, whose title appears above, the specification of which: 



is attached hereto. 

□ was filed on as Serial No. 

CH said application having been amended on _ 



I hereby state that I have reviewed and understand the contents of the above-identified 
specification, including the claims, as amended by any amendment referred to above. 

I acknowledge the duty to disclose to the U.S. Patent and Trademark Office all information 
known to be material to the patentability of this application in accordance with 37 CFR § 
1.56. 



I hereby claim foreign priority benefits under 35 U.S.C. § 1 19(a-d) of any foreign 



DOCKET NO. CHIR-0160 
(356.001) 



-2- 



PATENT 



application(s) for patent or inventor's certificate listed below and have also identified below 
any foreign application for patent or inventor's certificate having a filing date before that of 
any application on which priority is claimed: 

Priority Country Serial Number Date Filed 

Claimed 

(IfX'd) 

PCT PCT/IB98/01665 October 9. 1998 



□ 



I hereby claim the benefit under 35 U.S.C. § 120 of any United States application(s) 
listed below and, insofar as the subject matter of each of the claims of this application is not 
disclosed in the prior United States application in the manner provided by the first paragraph 
of 35 U.S.C. § 1 12, 1 acknowledge the duty to disclose to the U.S. Patent and Trademark 
Office all information known to be material to patentability as defined in 37 CFR § 1.56 
which became available between the filing date of the prior application and the national or 
PCT international filing date of this application: 

Serial Number Date Filed Patented/Pending/Abandoned 



I hereby claim the benefit under 35 U.S.C. § 119(e) of any United States provisional 
application(s) listed below: 

Serial Number Date Filed 



I hereby appoint the following persons of the firm of WOODCOCK WASHBURN 
KURTZ MACKDZWICZ & NORRIS LLP, One Liberty Place - 46th Floor, Philadelphia, 
Pennsylvania 19103 as attorneys and/or agents to prosecute this application and to transact all 
business in the Patent and Trademark Office connected therewith: 
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Robert B. Washburn 
Richard E. Kurtz 
John J. Mackiewicz 
Norman L. Norris 
Albert W. Preston, Jr. 
Dale M. Heist 
Philip S. Johnson 
John W. Caldwell 
Gary H. Levin 
Steven J. Rocci 
Dianne B. Elderkin 
Francis A. Paintin 
John P. Donohue, Jr. 
Henrik D. Parker 
Suzanne E. Miller 
Lynn B. Morreale 
Mark DeLuca 
Joseph Lucci 
Michael P. Dunnam 
Michael D. Stein 
Albert J. Marcellino 
David R. Bailey 
Doreen Yatko Trujillo 
Barbara L. Mullin 
Kevin M. Flannery 
Lynn A. Malinoski 
Lori Y. Beardell 



nNo. 16,574 
Registration No. 19,263 
Registration No. 19,709 
Registration No. 24,196 
Registration No. 25,366 
Registration No. 28,425 
Registration No. 27,200 
Registration No. 28,937 
Registration No. 28,734 
Registration No. 30,489 
Registration No. 28,598 
Registration No. 19,386 
Registration No. 29,916 
Registration No . 3 1 , 863 
Registration No. 32,279 
Registration No. 32,842 
Registration No . 33 ,229 
Registration No. 33,307 
Registration No. 32,6 1 1 
Registration No. 34,734 
Registration No. 34,664 
Registration No. 35,057 
Registration No. 3 5,7 1 9 
Registration No. 38,250 
Registration No. 35,871 
Registration No. 38,788 
Registration No. 34,293 



Michael P. Straher 
David A. Cherry 
Anthony J. Rossi 
Michael J. Swope 
Michael J. Bonella 
Harold H. Fullmer 
William R. Richter 
John E. McGlynn 
KimberlyRHild 
Lawrence A. Aaronson 
Jonathan M. Waldman 
Paul K. Legaard 
Chad Ziegler 
David N. Farsiou 
Maureen Gibbons 
Steven H. Meyer 
John M. Paolino 
Joseph R. Condo 
Michael K. Jones 
Frank T. Carroll 
Rena Patel 
Mark J. Rosen 
Gregory L. Hillyer 
Maria M. Kourtakis 



i No. 38,325 
Registration No. 35,099 
Registration No. 24,053 
Registration No. 38,041 
Registration No. 41,628 
Registration No. 42,560 
Registration No. 43,879 
Registration No. 42,863 
Registration No. 39,224 
Registration No. 38,369 
Registration No. 40,861 
Registration No. 38,534 
Registration No. 44,273 
Registration No. 44,104 
Registration No. 44,121 
Registration No. 37,189 
Registration No. 40,340 
Registration No. 42,431 
Registration No. 41,100 
Registration No. 42,392 
Registration No. 41,412 
Registration No. 39,822 
Registration No. 44,154 
Registration No. 41,126 



In addition to the attorneys listed above, the undersigned hereby appoints the 
attorneys listed below of CHIRON CORPORATION, 4560 Horton Street, Emeryville, CA 
94608-2916 as attorneys for applicants, with full power of substitution and revocation, to 
prosecute this application and to transact all business in the Patent and Trademark Office 
connected therewith: 

Alisa A. Harbin Registration No. 33,895 

Robert P. Blackburn Registration No. 30,447 

Joseph H. Guth Registration No. 3 1 ,26 1 

Kenneth M. Goldman Registration No . 3 4, 1 74. 

Please send all future correspondence to: 

Alisa A. Harbin, Esq. 
Intellectual Property 
Chiron Corporation 
4560 Horton Street 
Emeryville, CA 94608-2916 
Telephone: (510) 923-2708 
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Please direct all telephone calls to: 
Mark J. Rosen 

WOODCOCK WASHBURN KURTZ 
MACKIEWICZ & NORRIS LLP 

One Liberty Place - 46th Floor 
Philadelphia PA 19103 
Telephone No.: (215) 568-3100 
Facsimile No.: (215)568-3439 

I hereby declare that all statements made herein of my own knowledge are true and that all 
statements made on information and belief are believed to be true; and further that these 
statements were made with the knowledge that willful false statements and the like so made 
are punishable by fine or imprisonment, or both, under Section 1001 of Title 18 of the 
United States Code and that such willful false statements may jeopardize the validity of the 
application or any patent issued thereon. 



Name: 




Enzo Scalato 




Mailing Address: 


Signature 


Chiron SpA 




Via Fiorentina 




53100 Siena 


Date of Signature: 


Italy 


Citizenship: Italy 


City/State of Actual Residence: 


Colle Val d'Elsa (SI), Italy 





Name: 




Masignani Vega 




Mailing Address: 


Signature 


Chiron SpA 




Via Fiorentina 




53100 Siena 


Date of Signature: 


Italy 




City/State of Actual Residence: 


Citizenship: Italy 


Siena, Italy 
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Name: 




Rino Rappuoli 




jyj^iling Address : 


Signature 


Chiron SpA 




Via Fiorentina 




53100 Siena 


Date of Signature: 


Italy 




City/State of Actual Residence: 


Citizenship: Italy 


Berardenga (SI), Italy 





Name: 

Mariagrazia Pizza 




Mailing Address: 

Chiron SpA 
Via Fiorentina 
53100 Siena 
Italy 


Signature 
Date of Signature: 


Citizenship: Italy 


City/State of Actual Residence: 

Sienna, Italy 





Name: 

Guido Grandi 




Mailing Address: 

Chiron SpA 
Via Fiorentina 
53100 Siena 
Italy 


Signature 
Date of Signature: 


Citizenship: Italy 


City /State of Actual Residence: 

Segrate (MI), Italy 



